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ABSTRACT 


Genetic algorithms are adaptive search procedures which 
employ analogies to the mechanisms of adaptation in natural 
evolution to provide solutions to a wide variety of search 
problems. This work analyzes these algorithms within the 
context of function optimization. First, a formal framework 
for search problems and methods is presented. It is seen 
that global function optimization is often concerned with 
problems arising from static complexity (e.g. multimodality) 
and a priori ignorance. Overall best performance is chosen 
as the measure which should be used to evaluate optimization 
methods for static, deterministic functions. 

Then the various features of genetic algorithms are 
described in detail. Eight experiments are conducted on a 
set of seven multimodal test functions to determine which 
Specific genetic algorithm is best suited to (static) 
function optimization. The experiments reveal that the use 
of larger memory than previously and the careful control of 
stochastic error during an internal sampling process improve 
algorithm performance. Representations of problem solutions 
which derive from the genetic phenomenon of diploidy are 
shown to have little effect on performance. 

Finally, one experiment is designed to compare a 
genetic algorithm to the best methods available for global 


optimization of functions of unknown difficulty. Both the 
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genetic algorithm and a "direct search" numerical method are 
Superior to random search. The numerical method is impaired 
by multimodality in Euclidean space. The genetic algorithm 
transforms points in Euclidean space into points in a space 
defined in terms of Hamming distance, and is impaired by 
nonlinearities in this space. There are many types of 
problems which do not require this transformation and have 
natural Hamming representations. Genetic algorithms using 
binary representations are probably better suited to such 
Hamming space problems than to functions defined over 


Euclidean space. 
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CHAPTER 1 


INTRODUCTION 


Adaptation has long been of interest to those involved 
in problem solving. In fields from numerical analysis to 
automatic learning to process control, it is recognized that 
if the problem is very difficult, an adaptive search method 
may offer the best hope of achieving a good solution. An 
adaptive search procedure is defined as any method which 
continually modifies the direction of future search on the 
basis of information received during the search process. A 
Gredient»=technigue in fLunetion,optimization, then, vis an 
adaptive process, aS iS a computer program which "learns" to 
improve its chess playing. 

Many biological systems are also adaptive in nature, 
including systems of evolution, neural systems, the immune 
system, and cell processes at the molecular level. An 
understanding of the techniques of natural adaptation can 
aid in the development of artificial systems. For example, 
knowledge of human learning behavior has influenced many 
computer learning techniques. The analysis of natural 
adaptation has also led to the development of genetic 
algorithms. 

A genetic algorithm is a generalized adaptive search 
procedure which employs many of the structures and 


techniques of genetic systems. Research in genetic 


ses besingesss et H ities agooe1g 32 entatest Dt 
soddse dogeee evisgese ae. PTuOeS TIS Vey et lee 


sdf at dosage sICHeY 2m pea sasdel ortt ms F 9007) 4° 
A eeeneg gored pds ge stab bate vasa Hotz Rad CAs eo 
ne ot tied orsaeiiiee oeksoaes OL sijsennaes Set i 
oz “enzasl” ubriv oo 202%: idmedtdo.5 et “he Papin? pec tees 

| * Sapilies ty ita. 688 voxtak 


euvndan a2 aul sqnee nuit 428 Bees Ta>cpetoltd yan . 


pened £62, tadsete Ieee atoleglagy to anereN® odinuanak 
eh foes! shiaaeion 209, ro savepeoss feo. B45 MORE 

esp notsesgibe faxizen To Sevpiiiged ets = (none eS, 
~atqness 90% «-entedeyps Thiai 2! Kae jo Scipgeievst ont ak Bib, 
gran Boonseétal ead Pokvettod yalnseet vent +e eubolwandt 
instgad do Riayeene ett pt-vgiWeas atinnsel qedeqeiee 
am ‘pissnep to inemiggiayad wet of Wet oats eto. nedcadgehes | 
-emtdtroghe 


fpscee svizqebs eer 4 eh msiaoeia Sigonep A 


algorithms was initiated by John Holland at the University 
OLeMichigan in the slate 1960's) "Holland (1975)> first 
formalized the notion of adaptation by describing the 
features common to all adaptive systems. He then showed how 
the particular devices of genetic adaptation could be 
modelled within the formal framework to form a procedure 
applicable to many problem paradigms. The original 
specification, called a "Reproductive Plan", was actually an 
outline which could be implemented in several ways. Thus 
the Reproductive Plan actually defined a class of search 
procedures. 

As with any new group of problem-solving techniques, 
two questions have to be answered about the class of genetic 
algorithms. First, what is the domain of problems over 
which genetic algorithms are successful and superior to 
other known methods? Second, what specific members of this 
class are most desirable? In general, these questions 
cannot be answered independently, as each algorithm may have 
its own area of high performance. Also, the scope of these 
issues is large and a single thesis, or even a lifetime of 
research, will not resolve them. In order to provide even 
partial answers, it is necessary to have a clear framework 
relating search problems, adaptation, and genetic 


algorithms. 
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1.1 A FORMALISM FOR SEARCH 

The formal framework for adaptation developed by 
Holland can be extended into a formalization for search 
problems in general, regardless of the nature of the 
algorithms which might be employed. A search problem p 


contains the following components: 


S = {s} the set of all possible solutions to the problem. 
This set may be infinite, even non-denumerable. 

F:SxT-->R a function which maps an element s in S at time t 
in T into the set of real numbers. This is the 
evaluation function for potential solutions to the 
problem. It may be a threshold function, where a 
solution either is accepted or rejected, or a 
probabilistic function, where a solution may have 
a certain likelihood of being good. These and 


other possibilities will be investigated later. 


As a Side note, it may be argued that F need not map into 
the reals. For instance, if the problem is survival in a 
given natural environment, S could be the set of possible 
individual organisms, and F would be organism fitness, which 
has no intuitive representation in real numbers. However, 
any researcher attempting to compare the fitnesses of 
various organisms first develops a mapping from organisms to 
a partially ordered set, e.g. organisms to their life spans. 
At this point, the survival problem has been rewritten into 


a well-formulated search problem. This thesis is concerned 
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only with well-formulated problems and not with the issues 


involved in the development of evaluation functions. 


An analysis of search methods includes these elements: 


Ae) 
i] 


{p} the set of search problems of interest. 

A = {a} the set of algorithms which are under 
consideration for use in solving the problems. 

X:AxP-->R a function mapping the behavior of an algorithm on 
a problem into the real numbers. An algorithm "a" 

applied to a problem p can be characterized by 


1) the set {s, 1 IS oresolucion points 


poeoeosg Ss 
yee Ce 


searched, and 

Z)the costs of operation of the algorithm (the 

resources consumed). 

The domain AxP is represented by these values. 
U:A-->R the global measure of the utility of an algorithm 

on the problem set P. For example, U may be the 

weighted average of an algorithm's performance 


Cla, pe over alble@probiems prineer. 


Each of these four items is examined in detail below. 
Examples are drawn primarily from the fields of function 
Optimization and artificial intelligence, although research 
into search techniques is by no means restricted to these 
areas. Cooper and Steinberg (1970) and Sampson (1976) are 


suggested references, respectively, for those two subjects. 
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1.2 SEARCH PROBLEMS 

The features which render a search problem non-trivial 
can be grouped under four headings. 

Dee & Parorimignorance, 1s thes absence of ~useful 
information about the problem. The set S may have no 
obvious bounds. In this case most search algorithms must 
arbitrarily impose limits and work within them, whether or 
not the desired solution is in the chosen region. For 
example, Samuel's (1959) program for learning the game of 
checkers uses 40 parameters in its evaluation of board 
positions. The learning process iS a search for the optimal 
weights for the parameters in a linear equation. There is 
no guarantee, however, that the best program uses only those 
40 parameters. Samuel admitted that this was an arbitrary 
and unsatisfactory restriction on the search, but could see 
no way to extend the limits to include all possible checkers 
strategies. 

A priori ignorance also includes lack of information 
about the evaluation function F. In function optimization 
today, most methods are effective on unimodal surfaces only 
and many aSSume continuity and even differentiability. But 
a process control problem, for instance, may not yield a 
closed form function which can be examined for these 
features. The only evaluation procedure may be to implement 
the proposed solution in the process environment and observe 
results. In this case, F may or may not be unimodal. Any 


results from a unimodal search method would be unreliable, 
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Since the algorithm might very well be operating on a 
problem it is incapable of optimizing. 

inegeneraleaspr vom 1gonOrances or the nature) or the 
evaluation function can only be overcome by an algorithm 
which is effective over the entire range of functions which 
can be characterized by any known information. 

2. This ignorance concerning the problem can be 
differentiated from static complexity, in which the problem 
is known to have specific features which make it hard to 
search. The size of the solution set S can determine 
problem complexity, aS can many characteristics of the 
evaluation function. Items which fall under the heading of 
static complexity have been a prime target for research in 
the field tor function optimization, for vexample high 
dimensionality, multimodality, and discontinuity in the 
evaluation shunction.  yStatic=complexity calls ‘for algorithms 
which are tailored to particular problem features. 

3. When a problem's evaluation function is not simply 
a functiom of the solutions S, but also of time, the problem 
is said to involve dynamic ignorance. In many situations, 
the value of a solution changes from one time to another. 
The optimal amount of steam production to keep a building at 
room temperature on a September afternoon may not be the 
optimal amount four months later, or even twelve hours 
later. Functions which vary with respect to time require 
algorithms capable of tracking good solutions or handling 


large guantum changes in the evaluation function. 
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4. Dynamic uncertainty occurs when the function F is a 
probability function over R. The value obtained for a given 
solution s at a time t may be the result of a 
nondeterministic process, or may be perturbed by random 
noise. An excellent example of nondeterminism, which has 
been carefully analyzed by Holland (1975), is the problem of 
the two-armed bandit. A gambler is informed by the casino 
manager (who is honest) that one of two slot machines, Y and 
Z, pays off more frequently than the other. How should the 
gambler allocate his fixed stake to the two machines to 
optimize his total payoff? In the search formalism, S is 
the seesof twormplays, One-om Y and ‘oOneton Z.—) F(s), the 
payoff for each play s, is a probability function based on 
the two machine payoff frequencies. A good algorithm for 
this» problem must wrespond™@to*both *thesasprronrs ignorance 
regarding which machine is better and the nondeterministic 


manner in which the elements of S are given values. 


i. 3) SEARCH ALGORETHMS 

Search problems can be categorized according to 
features of difficulty. A taxonomy can also be constructed 
for algorithms. Classification in this case is dependent 
upon the techniques employed in searching. 

Indirect or analytical methods are usually not 
considered search algorithms, since they do not evaluate 
points in S during their operation. Instead, a priori 
knowledge of F and S is used in a series of computations to 


arrive at. the desired solution. The classical methods of 
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EuUnCEION: optimization, e.g. finding the minimum point. of a 
parabola by setting the derivative to zero, are indirect 
methods. 

Random search is the evaluation of a sequence of 
unrelated solutions. If the set S is sufficiently small, 
random search often can be refined into enumeration, where 
all elements of S are evaluated. Random search is the 
cheapest form of search and so serves as a benchmark in 
algorithm comparisons. It is also the best method currently 
known for some very difficult problems. 

Adaptive search has already been defined as the use of 
feedback information to direct the search process. 
Algorithms which are adaptive in nature can be grouped 
according to the sources of the information employed. 

I is the vector of information available from the solution 
previously evaluated. Ina first-order method, I 
consists only of the measured value of the solution, 
F(s). In higher-order methods, I may incorporate 
information such as derivative values or individual 
components Of Fs), in addition to F(s). 

M is the memory of previous steps in the search process. 
Many algorithms use a fixed amount of space to save the 
history of the search in some condensed form. This 
memory is constantly updated with information from I and 
used to select new solutions for testing. Dichotomous 
Search in function optimization requires memory only of 


the previous two solutions evaluated, while "conjugate 
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direction" methods maintain information on the directions 
of the previous n searches. 

GEEcCONSISCCSROfPGrih Sintormation, erehints a.thiseasenct 
feedback material, but is input from outside the search 
process which is employed along with information from I 
ands M. smite sjamportantsto: equalizer gifts intormationewhen 


making comparisons between methods. 


1.4 ALGORITHM EVALUATION 

To reiterate, the observed performance and resources 
required by an algorithm on a problem are evaluated by means 
of a function X. X combines certain cost measures with 
performance meaSures derived from the set of solutions 
tested during search. Measures of resources used include: 

1) space costs, in terms of the room needed to store M 
and I in an adaptive algorithm, plus, possibly, the space 
required for storage of the algorithm itself, and 

2) time cost, which is meaSured in any of: computer CPU 
time (if applicable), number of primitive operations 
performed, number of ssolutiony points searched in SS, o1r 
number of invocations of the function F. Computer time is 
machine dependent. CPU time and counts of primitive 
operations are sensitive to variations in the coding of the 
algorithm. The number of solutions searched may hide large 
time expenditures, such as multiple function evaluations 
used to approximate derivatives at each solution point. 
Counting calls of F ignores such things as the time spent in 


updating the algorithm's memory, which may be considerable. 
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Nonetheless, function calls are usually assumed to be orders 
of magnitude more expensive than other steps in algorithm 
execution, so function invocations are a popular measure of 
time cost. 

Three common measures of performance are overall best, 
offline, and online F values. The overall best value is 
just that: the value of the single best solution out of all 
solutions evaluated. This measure is of interest in 
function optimization and other problem areas where one end 
product is the goal of the search. 

Online performance is measured by the (possibly 
weighted) average evaluation of all solutions examined. In 
a field such as process control, the evaluation of a 
solution consists of implementing the solution in the 
environment. It is desirable to improve the quality of the 
control while at the same time not subjecting the 
environment to poor, possibly damaging, control solutions. 
Online performance in such a case refers to the average 
quality of the solutions tested. 

Some problems require a steady stream of solutions, as 
in the case of online process control, but the evaluation 
procedure is not equivalent to the implementation of the 
solution. If the process environment is simulated on a 
computer, solutions can be tested there, while the best 
solution to date at any time is employed in the real 
environment. | [his Obt Mine spertormance, 14S lt) Ss) Calledsby 


De Jong (1975), can be measured by the average over all time 
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steps of the best value up to that time. 

There are many ways to incorporate cost and performance 
measures into the function X. Most functions reflect the 
CONGMETONS peli any piwnienes terminate the search process. ait 
che SearchesisS Ongoing. X will) be a function: of thes. cost per 
time step and the online or offline performance. If the 
search terminates, it may be in consideration of one of the 
following three goals: 

Fixed Resources. The algorithm consumes fixed amounts 
of resources and is evaluated solely on its performance. X 
is a performance measure value obtained after the time 
interval allotted for search has elapsed. 

Fixed Performance. The algorithm continues until a 
certain level of performance is achieved. xX becomes the 
cost in time and/or space which is incurred to reach the 
given performance threshold. 

Variables Resources ands Performances sa ins. somenproblems 
resource limitations are unimportant and the range of 
performance values is unknown. For these, the best mode of 
Operation would allow the search process to continue as long 
as advances were made in performance, that is, as long as 
the algorithm had not exhausted its potential for fruitful 
search. Cavicchio (1970) proposed the technique of stopping 
the search when the rate of change of performance fell below 
some threshold. xX would then have to reflect both cost and 


performance. 
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1.5 RESEARCH GOALS 

The purpose of research on a specific algorithm is to 
determine its usefulness on a set of problems. It is 
improbable that there is one method which is more efficient 
and more effective than all other methods on all search 
problems. Therefore it is up to the researcher to narrow 
the definition of "useful" to encompass only a limited 
problem set P and a reasonable function U for algorithm 
Wei a cy. 

This work will focus on genetic algorithms as applied 
to two problem sets. The first set will be hard problems in 
(Static) function. optimization, for which random search has 
been the best algorithm found so far. The second will be a 
set of function optimization problems which span a large 
Gangesof difficulty: for standard search techniques such as 
descent methods. The goals in uSing these two groups are 
i) VEO Mts nde go0dmegenetic algornuthms fore tunction optimization 
by testing various algorithm features on hard functions, and 
2) to measure the robustness of genetic algorithms over a 
wide range of function optimization problems. Results from 
these investigations should lend further insights into the 
domain of applicability of genetic algorithms, as well as a 


better understanding of the algorithms themselves. 


12 


od. ite 44 site 


shed aes oes ay HuReRU" Be 
uivigepie tot ) pogranes slgengsnsi: « toe. & ws oe 


La: tyes ef genes tgp ke V139neR _ Be207 Lbiw —— acy 
ek exaetig39- brat #6: AheA Ase Seah, ati va? @iiesg eure. 
SEN ciogeee mOHIAL roth we? biota ean AGE nilyedud (329888) © 
= 2d Li io Sropce ssl fi se fro} adadsevls tase ie sed 


aysel &.negh 9) fiw ppe Soopthano ras tmiayD wot Yate for 
dm Hee eaupeaitios séahe fphhnstS x62. v0 feel s3 Po 26 eget” 
13, Gatav- at alee. =) <aho oon sansa 


= 
= 


=3e aque yTS owe 23 
péises ia2ge Apadsou} 202 ems. woul yiscosq boor Gnib OF a 
ban .thadddoant Brad do eutettet Pipe te Lyes ter pxtiesd a” 


= 4avaamisizer ty) aiamep, hy) terreno: ptt 9 Iverel od 
opti adteaee .awslsesg qorsoscr lige S197 to oe abiw 
S6> cont dotigzant apidaw) Gast) bi ote Act itaeidrawel savdt © 
Sep, few as ,ettiatouss sFteeie ¢3i video diggs am rica be : 
sevigertih sudeiisgss “othe 3 polbrkcerstaw saiied - 
dy bee 


a 


GHARITER @2 


GENETIC ALGORITHMS 


Genetic algorithms are adaptive search procedures which 
simulate some of the processes of natural evolution. Groups 
of organisms often exhibit the ability to adapt to an 
environment (or environments) over several generations. 
Such adaptation at the population level can often be shown 
to be the result of, among other things, events in the 
differential transmission of genetically variable 
information at the molecular level. Molecular genetics 
involves the determination and analysis of the molecular 
Structures of heredity, while population genetics is the 
mathematical study of the consequences of inheritance with 
respect to groups of organisms. 

Genetic algorithms simulate a population of organisms 
Over several generations. The representation and 
modification of the organisms is based on information from 
molecular genetics, while the expected behavior of the 
algorithms is derived using the methods of population 
genetics. However, genetic algorithms are intended to be 
problem-solving methods in fields such as function 
Optimization. » sAS@such; jthey Fdownot, necessarily contribute 
to the various fields of genetics. 

In this chapter, genetic algorithms are set within the 


formal framework of Chapter 1. Since the terminology of 
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genetics most accurately describes the features of the 


algorithms, this terminology prevails in the discussion. 


2.1 THE REPRODUCTIVE PLAN 
A genetic algorithm is adaptive. At each step in the 


search process, it has the following information: 


I This has been simply F(s) in all genetic algorithms thus 
far. They are first-order methods. 

M M iS a population (group) consisting of organisms 
evaluated at earlier time steps. This population does 
not include all organisms from previous Steps, only a 


chosen subset. 


ThewReproductive’ Plan, ‘ore outline ot, the cycle of 


operation, has five basic steps: 


1. n organisms are generated to form the 
population. 

2. n' organisms are selected according to fitness 
andscopwved (Por fornmasset Of Nparents. 

Biase Obdas Dr IiGpedl Curcheaced a OmMecne user b 
Davents *hymthe application Of operators. 

4, g members of the population are selected for 
replacement. These organisms expire and are 
replaced by the offspring from step 3. 

53 a New generation (population) has been formed. 


The cycle begins again at step 2. 
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A genetic algorithm can be applied to any problem 


represented in the following manner: 


S is a set of organisms (Structures). Each organism 
has 
1) a genotype (genetic structure) consisting of 
chromosomes (strings) of alleles (values). Each 
locus (position) on a chromosome has a set of 
possible alleles associated with it. 
2) a function D which maps the genotype of the 
organism onto a set of features evaluated by F. 
This set is called the phenotype. Thus 
D: genotype-->phenotype. 

F is the environment within which organisms are 
evaluated. An evaluation results in a fitness value 


F(S) for "an organism “s- 


For example, assume the problem is to find the 
maximum of a function of two variables, H:XxY-->R, 
where each variable ranges from 0 to 999, with the 
additional restriction that the solution's values for x 
and y must be integers. “A solution “s, say s=(905,82), 
can be represented as one chromosome of six loci 
"Aj ana3a4acae"s OF Inethis: case — 905082" "Each of “the 
loci has the same set of ten possible alleles, the 
digits 0 through 9." The®iunection sD maps this 
chromosome onto the set of integer pairs (x,y) (the 


phenotype) using the eguations 
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The fitness of a solution is simply H(x,y). 


2.2 IMPLEMENTATIONS 

Each of the steps in the Reproductive Plan can be 
implemented in several ways. Most features of the genetic 
algorithms studied up to now are modelled on some natural 
system. Arbitrary design decisions must occasionally be 
made, however, in the absence of information on natural 
systems, or when simplification of the model is necessary. 

The theses of Bagley (1967), Cavicchio (1970), 
Bolts tiene( LO7)) er rantze(1972) sand. De wong (1975) include 
research on specific algorithm features. Bagley compared 
certain genetic algorithms with non-adaptive correlation 
methods. Hollstien applied genetic algorithms to problems 
Of control policy formation... | Frantz observed the effects on 
genetic algorithms of non-linearities in the function F. 
Cavicchio and De Jong examined various parameter settings 
for genetic algorithms applied to problems in pattern 


classification and function optimization, respectively. 


2 2eiwLtializat ron 

The initial population has in all cases so far been 
generated according to a uniform random distribution over 
the sset..Sso0f possible "solutions. | This, could be»saltered if 
theneGwas jaepn10Ll sinitormationsindicating wthat thessearch 


should be biased toward certain subsets of S. 
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2.2.2 Parent Selection 

The success of the genetic algorithm as an adaptive 
search technique depends on the balance between history 
retention and exploration. The population M must serve both 
as the memory of good solutions tested and as the repository 
for new solution points. History retention is accomplished 
mainly by parent selection according to fitness. An 
organism is selected to be a parent with a frequency 
proportional to its fitness relative to the rest of the 
population. Offspring will resemble their parents, so good 
Organisms are maintained through succeeding generations. 

Relative fitness can be defined in terms of rank in the 
popllation (Cavicchio, 1970; Hollistien, 1971). However, 
this seems to lead to overemphasis of a few top organisms of 
possibly marginal superiority, resulting in loss of 
population variance. Loss of population variance has been a 
recurring difficulty in the implementation of genetic 
algorithms. It is discussed in Section 2.4. 


Relative fitness is more commonly measured as 


f(s.) 


proportion of total population fitness, Y f(s) (Gavicchio, 
seM 

LOT rents pecoy 2, DemuOnd aslo 7. en LhUS sie canmOorganrsn 

accounts fom 0.10 cof the totalspopulation fitness, it should 

be selected 0.10°n' times to be a parent. Organisms of less 

than average utility will be represented fewer times in the 


set of parents than those of above average fitness. 
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Selection according to proportion of population fitness 
can be accomplished by adjusting the fitnesses to compensate 
for possible negative values and then using the fitnesses to 
impose a probability distribution on the population. Parent 
selection is then a matter of sampling from this 
distribution. Sampling must be accurate; if the poorer 
Organisms are overselected, history retention suffers, but 
if the better organisms are overselected, loss of population 
variance results. The issues involved in adjusting 


fitnesses and sampling are examined in detail in Chapter 4. 


2.2.3 Offspring Generation 

If offspring were merely copies of their parents, no 
new organisms would be evaluated. Search would be limited 
to those organisms represented in the initial population. 
Exploration is introduced into the algorithm through genetic 
operators. These operators alter the offspring so that they 
Still resemble, but are not identical to, their parents. 

The mutation and crossover operators, when used together, 
provide for adaptive exploration. 

Mutation. Point mutation consists of replacing an 
allele in a chromosome by a random allele from the set of 
possible alleles for that locus. In the example of the two 
dimensional function, a point mutation might change the 
ehromosome "905082" into “S05012". © Loci to be mutated are 
chosen randomly, with a probability uae that any one locus is 
mutated. This is a simplification of the situation in 


natural systems, where each locus may have its own frequency 
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of mutation, and mutation rates may be dependent on 
environmental factors such as radiation. 

Frequent application of the mutation operator can 
result in randomly generated offspring which bear little 
resemblance to previous generations. Mutation in general 
hinders the history retention of adaptation and lends a 
degree of randomness to the search. It can be used as an 
instrument for exploration, but it serves a more important 
function. Mutation prevents the permanent disappearance of 
an allele from the population. 

Assume that the genotype consists of a single 
chromosome of unique loci, each with two possible alleles, 
and that the entire population of size n is replaced each 
generation. The minimum mutation rate needed to insure the 
presence of an undesirable allele somewhere in the 
population for each locus at each generation is roughly 
P_= = (Holland, 1975). De Jong (1975) found that for 


moon 
values of Pa between = and -, offline performance was much 
the same, but that the higher mutation rates degraded online 
performance. In the end he settled on a mutation rate of 
al as optimal for online performance and acceptable for 
offline performance. 

Crossover. Crossover introduces adaptive exploration 


into genetic algorithms. Crossover occurs when two 


chromosomes exchange segments (see Figure 2.1). 
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Unequal Crossover 
Multiple Crossover 


‘ Homologous Crossover 
) 
) 
In natural organisms, crossover usually occurs between 
matching sites on the two chromosomes. Let locus 1 have 
allveletsecsia,A}?,; locus 2 have valtlele set {(b,B}pretes “Then 
FIGuree2aloa Tl lustuates Wa ChOssover at a point “of “homology: 
both chromosomes break between loci two and three. Two 
chromosomes are homologous if all (or almost all) of their 
loci correspond, even though they do not have the same 
alleles at teach "of -the loci. seCcrossover occasionally “occurs 
at non-homologous 3 pornts. “Figure 271. b shows unequal 
crossover, where two homologous chromosomes cross at points 
that do not correspond. This is a possible natural 
mechanism for deletions and duplications “of loci. 

Natural chromosomes are not restricted to one cross per 


chromosome (see Figure 2.l.c for an example of multiple 
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crossover). Each point on a chromosome may have its own 
frequency for breaking and recombining. This frequency may 
be affected by environmental factors. Except for a brief 
examination of double crossover by Cavicchio (1970), 
experiments and analysis of genetic algorithms have been 
limited to the case of at most one crossover per chromosome 
pair, where pairs cross with a fixed frequency Poe ibe 
crossover occurs, all points on the chromosome have equal 
Probability of being the crossover point. “De Jong 
determined experimentally that P. = 0.6 optimized algorithm 
Online and offline performance on a variety of functions. 

in addrc1on .o. Selection paccording to fitness ,, natural 
adaptation relies on recombination between individuals for 
the development of improved populations. Crossover is an 
important tool for recombination. When each organism is 
represented by a single chromosome, crossover can occur 
between the chromosomes of two parents to form offspring 
which combine the features of both. This is analogous to 
the exchange of genetic information as it occurs in single 
chromosome bacteria. The offspring are not identical to 
either parent, so new organisms are evaluated, but the 
offspring still resemble the parents, so the exploration of 
new Organisms is directed by knowledge of previous 
evaluations. 

Mating. The inclusion of recombination between parents 
in step 3 raises questions about mating. Parental groups 


are limited to either one or two parents. If two parents 
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are paired to create offspring, there must be a mechanism 
for selecting the two from the set of parents. Hollstien 
(1971) described many natural mating schemes. He concluded 
that inbreeding (mating within "families", or groups of 
Organisms with common ancestry) combined with crossbreeding 
(mating between the best organisms in different families) 
provided good performance when incorporated into a genetic 
algorithm. However, it is not clear that the increased 
expense of controlled mating is justified in light of the 
Simplicity and good performance of random mating. 

Natural systems often employ restrictions on gender in 
mating. Population geneticists are not sure what the 
advantages of having genders are; in fact, the causes and 
benefits of recombination and sex in natural genetic systems 
are still not understood (Maynard Smith, 1978). Therefore, 
aS a Simplification, this work employs random pairing 
(without genders) within the set of n' parents to form a 
parental groups. Furthermore, each parental group produces 
exactly one offspring, which is chosen randomly from 


! 
possible offspring resulting from crossover. Thus n"= a 


2.2.4 Replacement Selection 

In addition to parent selection, replacement selection 
influences the composition of the next generation. The 
predominant natural method involves the deletion of inferior 
Organisms, OL “Survival of the fittest". However, in 
genetic algorithms the combination of small populations (50 


or so), parent selection according to fitness, and 
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replacement selection in inverse proportion to fitness leads 
to loss of population variance, and thus poor performance. 
Cavicchio (1970) tried replacing parents with their 
offspring, modelling the way salmon, for instance, die 
immediately after mating. In general, random replacement 
has been used (De Jong, 1975) for simplicity. Note that if 
g (generation replacement size) equals n (population size), 
then replacement selection is no longer an issue, as the 


entire population is replaced. 


2.2.5 Population Size and Generation Gap 

Holland (1975) analyzed the expected behavior of 
genetic algorithms with infinite populations. But in an 
implementation, the size n of the population is restricted 
by the space resources available for the simulation. 
Although it is possible to have populations where the number 
of organisms varies up to a preset maximum, it is unclear 
what advantage such a scheme would provide. Thus it has 
been standard practice to use populations of fixed size, and 
set n"“=g. Cavicchio (1970) and Hollstien (1971) used 
populations of) sizesel2ytor20, but. De Jong (1975). concluded 
that populations of under 50 suffered from loss of 
population variance. 

De Jong also investigated different possibilities for 
generation gaps, that is, for g relative to n. He decided 
that there was no advantage to overlapping generations, 


where g<n. His final implementation set n=g=50. 
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2-3 ISSUES OF REPRESENTATION 

The performance of a genetic algorithm on a problem 
depends heavily on the representation the problem is given. 
This discussion concerns the representation of function 
optimization problems, although many of the concepts apply 


elsewhere. 


2.3.1 Allele Sets and Chromosome Size 

A solution to a function optimization problem is an 
ordered set of numbers (Xp rXoreeerXa)e where d is the number 
of dimensions in the function. In the example of 
Section 2.1, such a set was encoded as a chromosome composed 
of decimal digits. The chromosome could also have contained 
binary, Octal, or hexadecimal digits. In the binary case, 
each locus would have an allele set of {0,1}. The 
chromosome would have to be longer than in the decimal case 
to represent solutions in the same solution set S. 

It is not immediately obvious which is preferable, long 
chromosomes and small allele sets, or short chromosomes and 
large allele sets. Two extreme points of view are 
represented by models used in population genetics for the 
prediction of allele frequencies. The first, the infinite 
Site model, assumes that for one organism feature there are 
an unlimited number of loci. Each locus may be mutated to 
an alternate allele, but the mutation rate is small, so the 
same locus will never mutate twice. In the infinite allele 
model, a function is encoded using one locus with an 


infinite number of possible alleles. It is assumed that 
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mutation will always introduce a non-existent allele into 
the population. 

One point in the analysis of these two models is of 
interest here. Under the infinite allele model, assuming a 
diploid population, no recombination, and random parent 
selection, the expected number of alleles present at one 
locus in a population at equilibrium is gore + 1, where n is 
the population size and es ts the probability of mutation: at 
each time step= (Kimura, 1968). 

In addition to his mathematical analysis, Kimura 
performed simulation experiments on a population of 100 
one=loctis individuals which initially contained 200 distinct 
avtetes, "ineaccerdance with the theory, a mutation rate of 
0.01 produced equilibrium populations (after about 20 
generations) which contained only 5-10 alleles. Thus, even 
without selection according to fitness, a population with 
the capacity for several hundred different alleles cannot be 
expected to maintain more than a few alleles at any one 
locus. With selection, the number of alleles would shrink 
Stall furnther. 

The infinite allele model is analogous to the situation 
where one x-value of a function solution is encoded using 
one locus on a chromosome. The only mechanism for 
introducing a new value into the population is mutation; 
crossover does not alter the alleles at individual loci. 
Since the number of alleles is large, only a large mutation 


rate will permit productive search. But as has been shown, 
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the result of a high mutation rate is random search. 

The alternative is to use many loci to represent one 
x-value. The advantage is now obvious: if Xs is represented 
by a gene comprising several loci, crossover becomes a tool 
for the creation of new x-values. The more loci there are 
in the gene, the more crossover is involved in this task, 
Since the probability of crossover within the gene 
increases. Also, if there are fewer alleles in the allele 
set for one locus, a smaller mutation rate will be 
sufficient to maintain all of those alleles in the 
population. 

The benefit in having multiple locus genes instead of a 
Single locus for each parameter can be summarized as an 
increase in adaptive power. Instead of creating new 
x-values only via mutation, new values are formed adaptively 
by recombining the smaller “buriding blocks" of individual 
Ue. 

For a fixed number of genic values, the maximum number 
of loci in a gene occurs when each locus has only two 
alleles. The advantage to having many loci iS now apparent. 
Therefore, the optimal representation for function values in 
a genetic algorithm is binary. This is consistent with the 
situation in natural systems, where many individual sites 
with four possible values each are concatenated to form one 
gene. 

Assume, then, that for each dimension i of a function, 


the minimum value minx; and the resolution between points 
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Ax, are known. If Xj is represented by loci j through k on 


the chromosome, then the phenotype map D is 


k 

a ac Nie k-v 

ao Biase ney? minx , + Axi) a,2 . 
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223-e2eb0undSs  and“Constraints 

No matter what allele sets are chosen, it is possible 
that the representation will span solutions which are not in 
Se wet x) is limited to the integers between 0 and 5 
inclusive and a binary representation is used, three loci 
will be needed. With three binary loci, eight integers can 
be represented, so the representation allows two solutions 
whichs abe. outside-S. 

In theory, a new representation could be chosen which 
takes into consideration the bounds on the search set S. In 
actiiaddtyetaaseustally nots obvious, how to construct, this 
hewebepEesentation,» 0a tie ttais/consttucted,, to show that it 
will be an effective representation for the genetic 
algorithm. In the same way, it is theoretically possible to 
build the constraints of a function optimization problem 
into the representation so that all represented solutions 
are legal. No methods exist for such transformations; the 
addition of constraints to a problem usually renders it 
unsolvable by unconstrained search methods. 

The subjects of bounds and constraints will not be 
considered here. Experiments will be limited to 
unconstrained functions which have solution sets with 


magnitudes of even powers of two. 
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2.3.3 Genotypic Variations 

A binary representation has one locus for each power of 
two. In standard notation, a binary number is written left 
to right in order of decreasing powers of two. However, 
there iS no reason to assume that this ordering is superior 
to any other with respect to genetic algorithms, especially 
when several numbers are concatenated on a chromosome. 
Grouping the high-order loci from all dimensions together, 
for example, might change the effect of crossover in the 
search process. 

Besides variations in locus order, there are many 
alternative formats which can be used to represent binary 
numbers in a genotype. In nature, genotypes range from 
Single chromosomes with fixed orderings of the loci to 
multiple chromosomes, polyploid genotypes, and chromosomes 
of variable length. With the exception of the two studies 
Of Bagley (1967) sanduhollstiens (1971) > work onivgenetic 
algorithms has been limited to genotypes of a single 
chromosome. More complex organizations should be 
considered, however. 

Multiple Chromosomes. Often in higher organisms the 
loci of the genotype are distributed on several chromosomes. 
These chromosomes may recombine with homologous chromosomes 
from another parent, but during the creation of offspring 
they assort independently. The resulting situation can be 
described as the loss of linkage between the loci on 


different chromosomes. 
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Linkage refers to the distance between two loci on a 
chromosome. A high linkage value indicates that the loci 
are close together; a value of zero means they are not on 
the same chromosome. In a single chromosome parent, the 
alleles of two adjacent loci will stay together in the 
offspring unless crossover occurs at the point between them. 
In a multiple chromosome parent, the loci on separate 
chromosomes are not linked and will end up in the same 
oEfPspring only by) chance’. 

Thus multiple chromosomes may be used for grouping loci 
whose functions are known to be independent. At the moment 
it is unclear whether any further benefit is derived from 
multiple chromosomes in natural organisms. 

Polyploidy. = Polyploidy occurs when a genotype contains 
homologous chromosomes. A common form is diploidy, in which 
each chromosome is represented twice in the genotype. Since 
two alleles are present for each locus, the function 
D:genotype-->phenotype must include some mapping which 
reduces two allelic values into one locus value. This is 
called the dominance map. 

By uSing a dominance map it is possible to carry 
alleles in the population without testing them in the 
environment at every generation. This "masking" of alleles 
may combat the loss of population variance. Diploidy and 
dominance are examined in Chapter 5. 

Circular Chromosomes. Thus far chromosomes have been 


presented as one-dimensional, linear strings. There are no 
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precedents in natural systems for multidimensional packages 
of genetic material, plates of alleles, for example. But 
there are bacteria and viruses which maintain genetic 
Material «lniscircular efonm. 

On a linear chromosome, the loci at the endpoints are 
L-l1 loci apart, where L is the chromosome length. A locus 
in the middle of the chromosome, however, iS no more than ~ 
loci distant from any other locus on the chromosome. The 
loci near the middle will exhibit different patterns of 
recombination than those on the ends. Their average 
distance from all other loci will be much smaller than that 
of an endpoint locus. 

The differences in average linkage distances between 
loci are eliminated by the use of circular chromosomes. The 
probability of the alleles at any two loci being separated 
by crossover changes as well. Two alleles on one circular 
chromosome are separated if a double crossover occurs with 
one crossover point on either "side" of one allele. In 
fact, linear chromosomes with single crossover can be viewed 
aS a special case of circular chromosomes with double 
crossover in which one crossover point is fixed at the end 
of the chromosome. 

It is unknown whether the use of circular chromosomes 
and double crossover would affect algorithm performance. 
This work will consider only the simple case of single 


crossover on linear chromosomes. 
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2.3.4 Dynamic Genotype Organization 

It has been assumed up to now that the representation 
of the space S is determined initially and remains fixed 
during the search process. This need not be the case. In 
nature organism genotypes are rarely static; instead they 
develop dynamically during adaptation. 

There are two main features of the genotype which may 
be subject to dynamic alteration: the set of loci which are 
represented and the linkage relationships among those loci. 
The set of loci can be changed by deleting loci, making 
addtelonal teopies of “Locy, and by -constructing ‘new ‘loci. 
Linkage is altered by means of the inversion and 
translocation operators. 

Duplication, Deletion, and Random Generation of Loci. 
There is a convenient representation of solutions to 
function optimization problems as binary strings. Each 
binary locus appears once in a genotype and the allelic 
value of that locus has a straightforward interpretation in 
the phenotype map D. If the map D is expanded to include a 
dominance scheme, multiple instances of the same locus can 
exist in one organism, as is the case in diploid 
individuals. 

However, there is no reasonable interpretation for the 
absence of a locus in a genotype. The transformation using 
D could include a random allele generator to expand 
deficient genotypes into legal phenotypic solutions, but 


this seems unnatural and of doubtful utility. 
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Alternatively, the alleles so generated could be fixed as 
0's or 1's, but this would be equivalent to the emphasis of 
amcertainssubseteotiS during gthe search. This could be 
justified only on the basis of a priori information. 

In other problem paradigms, especially those requiring 
"set" representations (see Section 2.3.5), there is a 
natural interpretation of multiple and missing loci. 
Cavicchio (1970) studied duplication and random generation 
of Loci in one such context, but not deletion. He concluded 
that intrachromosomal duplication was beneficial to the 
genetic algorithm, but that random generation of new loci 
was deleterious. 

The desirability of allowing multiple and missing loci 
in genetic algorithms for function optimization has not been 
investigated. Deletion of loci intuitively seems harmful. 
If duplication is allowed, some sort of dominance mechanism 
mist accompany 1t. —<in’ this respect, the result of 
duplicatton von algorithms formeftunction optimization problems 
would resemble that of polyploidy except for linkages. A 
study of diploidy should indicate the benefits of multiple 
loci in the genotype; further research could then reveal the 
utility of developing such multiple-loci genotypes 
dynamically via duplication. 

Linkage Alteration: Inversion and Translocation. As 
mentioned before (Section 2.3.3), the notion of linkage is 
crucial to a genetic algorithm. Obviously the best 


genotypic representation for a problem is that which groups 
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interactive loci together and separates independent loci on 
separate chromosomes. Of course, if the interactions are 
unknown, as is uSually the case, it is impossible to create 
Such#aneoptimala organizations initially. 

Inversion and translocation are two operators which 
alter linkages during offspring creation. Inversion removes 
a section of a chromosome, flips it end for end, and places 
it back in the chromosome (see Figure 2.2a). Translocation 
appends a terminal section of one chromosome to another 
(Figure 2.2b). Natural systems have mechanisms for both 


operators; these mechanisms need not be described here. 
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Figure 2.2. Linkage Alteration Operators 
(a) Inversion 
(b) Translocation 


Preliminary investigations (Holland, 1975) indicated 
that use of inversion would allow a genetic algorithm to 
search for an optimal genotype organization concurrently 
with the search for optimal solution points. After 
experimentation with various implementations, however, 
Brantz 11972) ‘concludedythateinversion did not contribute to 
algorithm performance. He hypothesized that inversion might 


be useful in environments requiring longer chromosomes (his 
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were roughly 25 loci each), or on problems employing a very 
long period of adaptation. De Jong (1975) agreed that 
inversion does not play a useful role in genetic algorithms 
for function optimization problems. One of the biological 
effects of inversion, which is to prevent crossover between 
inverted and noninverted chromosomes, has not been examined 
in research on genetic algorithms. 

The translocation operator has not been studied yet 
within the genetic algorithm. In view of the performance of 
the inversion operator, there seems to be no good reason to 
Study it extensively. In general, operators on genotypic 
Organization may be too subtle to show any effects when used 
within genetic algorithms applied to function optimization 


problems. 


2.360) [Ne semantics of Position 

The genetic algorithm as presented in this chapter 
relies on the representation of solutions as chromosomes. 
Each locus has its own allele set. The allele at one locus 
plays a specific and well-defined role in the phenotype map 
D. Ina binary representation, one locus is defined to be 
the “ones bit, one the “twos! bit, and so on; reflecting 
the powers of two by which the alleles are to be multiplied. 
Even if the loci are shuffled on the chromosome, the 
"meaning" of each locus must be retained. There is semantic 


significance to each chromosomal position. 
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Many problems have representations with no intrinsic 
semantic interpretation for the loci, for instance, tasks in 
the areas of pattern detection and automatic programming. 
Assume, for example, that in a pattern detection problem a 
picture is represented by a 625x625 grid of O's (light) and 
Sera ck. uur need 525. grid sectors on a picture cannot be 
compared to the sectors in a set of, say, 10 "pattern" 
pictures in order to locate the closest matching pattern. 

It is necessary to define a "detector set". Each detector 
is a function which gives a description of a small set of 
grid sectors, perhaps 5 sectors. The information from a set 
of 20 or so detectors is used in comparisons of pictures and 
patterns, and will permit optimal matching of pattern to 
picture, 

The selection of a detector set for a given pattern set 
is a difficult problem, possibly suitable for a genetic 
algorithm (see Cavicchio, 1970). Two representations 
immediately present themselves. The first is a bit string, 
a chromosome where each locus i*j has an allele to indicate 
the presence or absence of grid sector i in detector j. 

Each locus has a natural meaning, but there are 7,812,500 
loci! This may be difficult to implement. 

In the second representation, the chromosome consists 
of a string of grid numbers, each representing one sector 
which is present in a detector. Each locus has the allele 
set VL A a aoe Now the chromosome is of length 100; 


this is manageable. But the loci do not have any 
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Significance other than that of set membership. An allele 
of 3 in the first locus has the same meaning as the allele 3 
appearing in the fifth locus: sector 3 is in the first 
detector. 

A representation such as this without semantics of 
position may have profound effects on the genetic algorithm. 
In the set representation above, the crossover operator not 
only generates duplicate alleles (e.g. a 3 in both locus one 
and locus five is possible), but it is severely restricted 
in the manner that it can recombine subsets. A 3 in locus 
one and a 5 in locus two will stay together in offspring 
with higher probability than the 3 in locus one and a7 in 
locus five, although the 3, 5, and 7 are all members of the 
Same detector and should share the same relationships. 
Linkage here becomes a liability; it imposes an artificial 
and undesirable structure on the sets represented. 

The theoretical support for genetic algorithms stems 
mainly from Holland's (1975) analysis of hyperplane (schema) 
behavior within the Reproductive Plan (See Chapter 3). 

These hyperplanes are position dependent. Thus the 
analytical foundation of genetic algorithms may not be valid 
for problems uSing set representations. It is unclear 
whether on a set representation the recombination mechanism 
of crossover iS appropriate, and whether or not the genetic 
algorithm can be expected to exhibit good performance. 
Perhaps a new operator (or operators) is needed for the 


recombination of sets. 
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Since function optimization problems employ a binary 
string representation, the discussion of position semantics 


and set representations need not be pursued further here. 


2.4 THE PROBLEM OF LOSS OF POPULATION VARIANCE 

Recombination is effective as a search mechanism only 
as long as there are many different genotypes represented in 
the population. A population may become homogeneous, that 
is, all its organisms have similar genotypes. On such a 
population, crossover ceases to be a tool for exploration. 
Crossing two identical chromosomes yields the same 
chromosome again. 

Therefore it is desirable to maintain a varied 
Dopul atron within: the genetic algorithm for the: duration of 
the search process. With respect to function optimization, 
the loss of population variance is equivalent to convergence 
of the algorithm (see Martin, 1973, for an analysis of the 
convergence properties of a simplified Reproductive Plan). 
If the point of convergence is not optimal, the algorithm 
has undergone premature convergence. 

Premature convergence has been a problem in all 
implementations of genetic algorithms examined thus far. It 
stems from the necessary finiteness of the population and an 
improper balance in the competition for population space 
between history retention and exploration. An adaptive 
search algorithm must maintain a history of the search in 
order to direct further search, and must also provide 


temporary space for the information resulting from current 
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evaluations, e.g. the solution just tested and its value. 
In a genetic algorithm, the population serves both memory 
functions. A severe imbalance in population usage in favor 
of history retention results in immediate convergence, 
probably to a very poor solution. At the other extreme is 
random search, caused by the ignorance of search history 
during exploration. Somewhere in between lies the desired 
balance which yields adaptive search. 

The effects of population size, operator rates, parent 
selection methods, and replacement selection methods on 
population variance have been studied by Cavicchio (1970) 
and De Jonge 19 75). ie Thvssworkmcontinues wrthea further 
examination of accurate sampling in parent selection 
(Chapter 4) and diploidy (Chapter 5) as possible mechanisms 
for maintaining population variance and improving algorithm 


performance. 


2-5 CONCLUSIONS 

There are many possible implementations of the basic 
Reproductive Plan, and many variations on problem 
representation. For unconstrained, theoretically unbounded 
function optimization problems, a binary string 
representation has been selected, so that the semantics of 
position are straightforward. Genotypes of single 
chromosome and two chromosome, diploid organization will be 


studied. 
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The goal of the experiments presented in the next three 


chapters will be to improve algorithm performance on hard 


functions through reductions in the loss of population 


variance. 


Chapter 6 will attempt to verify that a good 


genetic algorithm isituseful in the field of function 


optimization as a robust problem solving method. 


The following particular Reproductive Plan will be 


employed hereafter: 


iin organisms; ng (50; sanevgenerated according to 
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asunirorm fandom «distribution over the set Ss. 


2n parents are selected by sampling from a 


distmubutionwovers the population. sslhis 


distribution reflects the relative fitness, 


f(s;) 


, of each organism Si. 
seEM 

parental groups are formed by random pairing. 
Single crossovers occur between chromosomes 
with a frequency Bake O20. eMUcat lon Occurs 
at each locus with a frequency Pa One 


offspring is chosen at random from each 


parental group. 


The old population is completely replaced by 


Chem -OnLSPring. 


The cycle repeats from step 2. 
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CHAPTER 3 


FUNCTION OPTIMIZATION BY GENETIC ALGORITHMS 


Thee characteristics of functions which eare *hard™ to 
Optimizes talleinto foun “categories: a prionmi- ignorance, 
static complexity, dynamic ignorance, and dynamic 
Uncebtainty (see Sectiontl.2)% Wn this chapter, many 
features of static complexity are examined and a set of test 
functions generated which will contrast the performance of 
different genetic algorithm implementations. The 
restriction »of Fattention to features’ of static complexity 
should facilitate the development of a genetic algorithm 
Suitable to mathematical function optimization. This 
algorithm can then be compared to other optimization methods 


LOMrsStatric,, deterministic functrons. 


Bel DIEFI CULT, PUNCT IONS 

An optimization algorithm can be described in terms of 
function features it can handle, features which impair or 
prevent its operation, and features which do not affect its 
performance. A brief review of available optimization 
methods will indicate what some of these features may be. 
The discussion of function features involves several 
mathematical terms which are defined in Appendix A. For 
simplicity, "optimize" will hereafter be interpreted as 
"maximize". The results are of course applicable to 


minimization problems as well. 
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Indirect methods usually require differentiability 
and/or continuity plus a function in closed form. Thus they 
are not applicable to functions specified in the form of 
tables or processes. However, indirect methods can optimize 
some constrained functions, some multidimensional, unimodal 
functions, and many multimodal, one dimensional functions. 

Random search is blind to most function 
characteristics. Its expected behavior is determined by the 
Size of the search space, the proportion of good points in 
the space, and the number of evaluations allowed during 
Search. 

Adaptive search methods vary considerably in their 
ability to cope with variations in constraints, 
dimensionality, differentiability, and continuity. In 
general, traditional feedback methods are appropriate to 
unimodal functions only. Methods in linear, nonlinear, 
integer and dynamic programming are designed to handle 
constraints. Dichotomous and Fibonacci search methods are 
restricted to bounded functions of one dimension, while 
Similar multidimensional tabulation techniques (e.g. 
multivariate grid search) become unwieldy for large numbers 
of dimensions. 

Some feedback algorithms are designed for unbounded, 
multidimensional functions. They rely on strict unimodality 
in the function when depicted in Euclidean space, and thus 
are susceptible to suboptimal ridges. Gradient methods find 


the direction of greatest slope and follow it. These 
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methods require differentiability, although the derivatives 
can often be approximated using multiple function 
evaluations. Direct search methods do not use derivatives, 
but usually exhibit slower convergence properties near the 
optimum than gradient methods. 

Genetic algorithms represent a radical departure from 
all previous search techniques. It is necessary to 
determine the extent to which the features of difficulty 


mentioned above influence or restrict their performance. 


Sei ie Dirterentiability and Continuity 

Genetic algorithms operate on discrete representations 
of functions and do not use derivatives. As will be seen in 
the discussion of modality, gross discontinuities in the 
Euclidean representation of a function need not impair 


algorithm performance. 


3.152? Constraints 

The ability of genetic algorithms to optimize within 
constraints has not been studied. The simplest way to 
incorporate constraints into the environment would be to 
assign very low values to organisms which fall outside the 
constraints. It is not clear whether the presence of such 
"illegal" organisms in the population would hinder or even 
destroy the search capabilities of the algorithm. This 
would make an interesting research subject, but will not be 


investigated here. 
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3.1.3 Dimensionality 

The search performance of genetic algorithms is not 
affected by the Euclidean dimensionality of the function. 
Due to the chromosomal structure given to solution points, 
multidimensional functions have representations similar to 
one-dimensional functions. As an example, a point in three 
dimensions is represented by concatenating the loci for each 


X-value. A function E(x) 1X57X defined over the integers 


3) 
O<x,<63, i=1,2,3, will be optimized in exactly the same 


manner aS a one dimensional function g(x), where 


= 2 
x = 64 x) a 64x. + Xr 


Se remainder of x/64 


1-1 Givided by 64 (integer 


division), and 
g(x) = £(x),%5,%3)- 
For any bounded function, such a one-dimensional counterpart 
is easily constructed. 
The performance of a genetic algorithm will be 
influenced not by how many dimensions there are in the 
domain of a function E(Xj reser Xa) e but by how complex an 


equivalent one-dimensional function g(x) is. 


S158 Modality 

Genetic algorithms have the ability to optimize 
multimodal functions in one or more dimensions (De Jong, 
1975), yet some rather simple unimodal functions cause them 
great difficulty. To understand why, the question of 


dimensionality must be further examined. 
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Let £(x)>= 64x. =— =- be defined over the integers 0 to 
63. £ is a parabola with one global maximum at 32. A 
genetic algorithm, when trying to locate this maximum, will 
very often converge on the value 3l, never reaching the 
"neighboring, poamt, 32. “The reason as. that the: points 31 
(OLLTEED) and 32° 100000) are not neighboring in the binary 
representation manipulated by the algorithm. The 
probability of generating 100000 by crossover or mutation 
from a population of "good" solutions in the range 20-30 is 
miniscule, and crossover between points such as 31 and 35 
(OLLI x L00L0T) witl produce poorer offspring such as 5 
(QO0LOL) and 63 (1 EEL is) 3 

The actions of the crossover and mutation operators are 
difficult to analyze in terms of one-dimensional Euclidean 
Space. The examination of a one-dimensional counterpart for 
any function does not increase understanding of how easily 
it will be optimized by a genetic algorithm. The 
examination of a multidimensional counterpart does. 

Let 1; be the number of loci used to represent Xue 


i=],...,d, in a point in d-dimensional Euclidean space, and 


3 


d 
= ) 1.. Consider an L-dimensional discrete space with 
CE 


e- 


only two possible values per dimension, 0 and 1. Clearly, 
all d-dimensional, discrete, bounded functions have many 
such L-dimensional counterparts. 
The distance between two points (Xp reser XQ) and 
d 
rrr a) in Euclidean space is (oe cpa Es ie In the 
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L-dimensional space above, the distance metric is Euclidean 


distance squared. Since Xs and y; are limited to 0 andl, 


iL L L 
ica: on X : 2 pie pe ’ 
i=l i=l = 


where on (the Kronecker Delta) = 1 if a=b, O otherwise. 

This particular kind of space is called a Hamming 
Space, and its simplified distance measure is Hamming 
distance. A genetic algorithm searches the L-dimensional 
Hamming space counterparts of functions. Mutation changes a 
point to one of the adjacent points one unit of distance 
away. When used with low frequency, mutation forms a 
mechanism for local search about a point. 

Crossover is more complicated. Define a k-hyperplane 
in Hamming space as the set of all points which have the 
same fixed values for the same k loci. A k-hyperplane can 
be described by the pattern of fixed loci which all of its 
elements match. For example, one 2-hyperplane in four 
dimensional space is {0110,0100,1110,1100}; this set matches 
Ene pactern *—-l=-0". 

One approach to searching Hamming or other spaces is to 
determine which features, or combinations of features 
(hyperplanes), are responsible for good solutions, and to 
recombine sets of good features to yield still better 
solutions. Although the apportionment of credit to 
hyperplanes would be a valuable technique for search, it 


cannot be implemented explicitly. 
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In L-dimensional Hamming space there are eee 
hyperplanes of interest (the O-hyperplane does not figure in 
apportionment of credit). In order to identify good 
hyperplanes, it is necessary to sample points from many 
hyperplanes. Each point in the space is a member of oe 
hyperplanes, however, so evaluating one point yields 
information about many hyperplanes. Thus it is possible to 
Sample sufficient points to indicate the utility of most 
hyperplanes, but storage of the summary information about 
each, hyperplane 1s ditticult. 

Holland (1975) has shown that in a genetic algorithm 
the population M can be expected to contain each hyperplane 
with frequency proportional to its observed utility. The 
population provides implicit storage of hyperplane 
utilities. A genetic algorithm, therefore, has the 
capability to search by means of apportionment of credit, 
provided that some mechanism for the recombination of 
hyperplanes is employed. This is the purpose of crossover. 

Crossover recombines hyperplanes in the population. In 
particular, the crossover operator, described in 
Section 2.2.3, will keep closely linked loci together while 
separating distant loci to effect recombination. As an 
illustration, consider a four dimensional space. An 
instance of the 2-hyperplane 10-- may occasionally be broken 
by crossover between loci one and two, but more often 
crossover will occur between loci two and four, to recombine 
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the hyperplane 1--0, on the other hand, will usually be 
destroyed by crossover, although it may be regenerated in 
the offspring from a similar hyperplane on the second 
chromosome. 

The operation of the genetic algorithm does not 
preclude the evaluation of hyperplanes containing distantly 
linked loci. However, more recombination and evaluation 
will occur for hyperplanes with closely linked loci. In 
general, too, more evaluations will occur for hyperplanes 
with few defining loci than for those with many. 

A genetic algorithm will be more successful in 
apportioning credit for performance to hyperplanes with a 
few, closely linked loci than to hyperplanes with many or 
distantly ehinked loci.) eht good. values for va function. can be 
attributed to the independent contributions of each locus, 
or of small sets of loci, then the genetic algorithm should 
be able to combine the independent effects to produce an 
optimal solution. “lf credit cannot be allocated 
independently to small, close groups of loci, then there 
must be dependencies, or nonlinearities, between large 
numbers of loci or distantly linked loci. These 
nonlinearities could impair algorithm performance. 

A genetic algorithm should be effective optimizing 
functions which are linearly independent over the loci in 
Hamming space. In terms of the Euclidean space over which 
the function may originally be defined, bounds are 


necessary, differentiability and dimensionality are 
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unimportant, and discontinuities and multimodality do not 
restrict the algorithm, but must be viewed in light of their 
effects on the nonlinearity of the function in Hamming 
space. 

Since multimodality has been the bane of most 
traditional optimization methods, the ability of genetic 
algorithms to optimize multimodal functions may prove to be 
their most important feature. This work will concentrate on 
the development of genetic algorithms for multimodal 
functions, using test functions which exhibit various locus 
dependencies in their Hamming space representations. All 
functions will be unconstrained, but bounded. Most will be 
one dimensional in their original representation, for ease 


of presentation. 


3.2 DESIGN OF EXPERIMENTS 

The experiments to test features of and to compare 
genetic algorithms to other methods consisted of two or more 
algorithms run on a set of test functions. In order to 
identify interactions between algorithm types and functions, 
a factorial design was used, for all experiments, with 
analysis of variance for main effects and their 
Interactions. 

F-ratios were defined as the mean square of effects 


over the mean square of an error term, or 


MS (A _ x B) 


EEE eS ENS MS(error)° 


Some of the experiments involved repeated measures (all 


factors listed after the replication factor in the 
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experiment design). For interactions involving repeated 
measures, the denominator of the F-ratio was MS(within cell 
x any repeated measures); otherwise it was simply MS(within 
cell) (Winer, !962)2) The Tevel “of statistical significance 
of effects was measured at 5%, with sufficient numbers of 
runs to satisfy assumptions of normality in the analysis (at 
least 30 degrees of freedom for the F-ratio denominator). 

Experiments were run on a PDP11/60 under the UNIX 
operating system uSing simulation programs written in the 
language "C". Analytical results were obtained using 
programs from the IMSL (1975) statistical package running on 
an Amdahl470/V7 under the Michigan Terminal System. 

Genetic algorithms are stochastic processes. The 
validity of experiments involving these algorithms is 
dependent upon the degree of randomness of the pseudo-random 
numbers employed. One run of a genetic algorithm with 
diploid population of size 200, chromosomes of length 50, 
and duration 100 generations could require as many as 40,000 
(about >) random numbers for mutation alone. The UNIX 
random number generator, which uses integer arithmetic and 


NHS) 


has a cycle period of 2°~-1, was considered inadequate for 


experiments of several runs. 
The multiplicative congruential generator 
= * 
Xia] 16384 Xs (mod 268435399) 
was designed according to the mathematical requisites of 
Downham and Roberts (1967), to provide a large period 


(ois) by uSing "C" double-precision floating point 
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arithmetic. Sequences of numbers from this generator passed 
several statistical tests of randomness at the 5% level, 
including the x? UNLEOLRM ity, ace SEunssup-and downe.,.and 
serial tests (Downham and Roberts, 1967). 

For simplicity, genetic algorithms will be identified 
in experiment summaries only by those features which 
differentiate them from the algorithm finally chosen by 
De Jong (1975). The parameters of this default 
implementation are listed below. Many of these will not be 
fully explained until later chapters. 


representation = binary 


population size (n) = 50 
generation size (g) = 50 
number of chromosomes per haploid genotype = 1 


ploidy = 1 (haploidy) 
dominance scheme = random 


crossover probability per chromosome (PA) = 0.6 


u 
bw 


number of crossover points per chromosome 
mutation probability per locus (P_) = 0.001 
parent selection = stochastic without replacement 
parent distribution base = worst all-time 

Mating = random 


replacement selection = random 


3.3 MEASURES OF PERFORMANCE 
All optimization methods were run in fixed resource 
mode, terminating on a preset number of function 


evaluations. Comparisons are on the basis of overall best 
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performance, or the maximum function value achieved on any 
evaluation up to the termination point, because this seems 
to be the most important measure for function optimization. 
Online performance was also monitored as an aid to 
understanding algorithm strengths and weaknesses. Offline 
performance was not analyzed, since plots of overall best 
performance over time seemed to convey sufficient 
information on the rate of improvement of overall best 
values. 

For those experiments concerned with changes in 
population variance, it was necessary to formulate possible 
measures of variance. Let a population contain chromosomes 
of length L, with B alleles (for base, e.g. binary, decimal) 
possible at each locus. For each allele ais ae locus! 71, 
LE ere oli eS pice st BLO Ne be the proportion of the 
chromosomes in the current population which contain ara: 

Fixation of loci is measured as the proportion of loci 
which are homogeneous for one allele, that is, for which the 


population has converged completely. 
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The sample variance of the allele frequencies is computed as 


L B 
Variance of Frequency = —-- ) Otleg ~ h) 2 
i=l j=l 
al = 2 
pa Cree 
‘ qe dha; ! Bl: 
i=l j=l 


Fixation indicates how many loci have reached extremes 
of allele frequency, while variance of frequency indicates 
the amount of variation of all alleles from the frequency 
value associated with high population variance. It may also 
be useful to detect the degree to which one allele is 
dominating a locus, in other words, to measure the amount of 
convergence at each locus. For this purpose, the average 
maximum frequency is defined as 

Average Maximum Frequency = i) max th; yi 
aa =1,B 
To determine whether all loci are converging similarly, the 


sample variance of maximum frequencies may be used: 
Variance of Maximum Frequency = 
where AMF is the Average Maximum Frequency. 


5.4 TES. FUNCTION SET) I 

The set of test functions designed for Chapters 4 and 5 
are defined over yo points with one global maximum of value 
LOOr (or nearly 100) Bithersetrincludes functions which in 


Hamming space are linearly independent, and functions which 


have closely linked and distantly linked dependent loci. 
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point in 30-dimensional Hamming space. 


where es = 1 if a=b, 0 otherwise. 
Maximumes flO; O07 se, UO) = 200 
Euclidean space modality: an equivalent one 
dimensional function, fle, over the integers 


50 ey would have 2°? GSax 10°) peaks. 


Oe 
fl is linearly independent over the loci in Hamming 
space. It should be very easy for a genetic algorithm to 


optimize. The equivalent one dimensional function fle is 


shown in Figure 3.l. 


B42 .FuUnceELon 2 
Let X = (Xj 40e+1X30) be a point in 30-dimensional 
Hamming space. Group the terms x; as follows: 


Gy = {X)1%97%37%oq7%X39} 


@) 
iW 
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5 = XgrXyQ0Xy50%X1 0X03) 


6 = 'XgrXy 91%) 30% 67X19) 
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1] would be highly multimodal. 
£2 should be very difficult for a genetic algorithm to 
optimize. The loci in each group are dependent and located 


all over the chromosome. The equivalent function f2e is 


showneins Figure 3.2. 
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Euclidean space modality: 2 (64) peaks. 


Each sine wave in the sum for £3 is constructed with 


etl and first peak at ae ae so that 2£ “ocus 30-1 


period 2 
is 1 the evaluation for that sine wave will be superior to 
when locus 30-i is 0. Thus in Hamming space f3 will have 


seven important, but independent, loci grouped at one end of 
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the chromosome. £3 1S shown in Figure 3.3. 
3.4.4) Function 4 
£4(X) = ) c; sin(m(X/2*+1)) 
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Maximum: £4(788053000) = 100.0 
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Euclidean space modality: 2 (53x: 210) )iv peaks. 
f4 is similar to £3, except that the important loci are 
widely distributed on the chromosome. f4 is shown in 


Figure 3.4. 
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Maximum: £5 (1061130000) (—) 100.0 
Euclidean space modality: approximately 68 peaks. 
The periods of the sine waves in f5 are odd, relatively 
prime, and in the range yee to oe This means that there 
will be a set of important loci which are dependent and 


closely linked at one end of the chromosome. £5 is shown in 


Figure’ 3.5. 
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dependencies between loci distributed over the length of the 


(lelexe0 peaks. 


£5, the sine periods are odd and relatively 


chromosome. fo 1S "Shown Ine Figure: 3.6. 


3.4. (EUR CtIOn 


The wide range of period magnitudes should cause 


Let Y = X72 for any integer X in the range [0, aT. 


Fed Xs) 


= 100 ae sin (16m (Y+.05874169672) °“>). 


Maximum: £(0) = 99.999985 


Euclidean space modality: 


Bouncer 


on 7, a dampened sinusoid, 


8 peaks. 


is included because of 


HES TOLER ICULtY EOT valle oprimiuzation methods. ~Lts lack of 4 


regular period should impair the performance of a genetic 


algorithm. 


f7 is shown is Figure 3.7. 
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3.5 EXPERIMENT 1: RANDOM SEARCH VS. THE DEFAULT ALGORITHM 
The first experiment was designed to evaluate Test 
Function Set I, evaluate the various measures of population 
variance, and determine an appropriate run length for later 
experiments. The default genetic algorithm and random 
search were run on all seven functions. Factors in the 

experiment were: 

ee EURGELOMNS aii? yess 4 5 moe is) 

2 me Replicarionse:( 10) 

acneAlLgonithmy (random) seauch, 

deraubt, genetic algorithm) 
The resultsmavetlisted sin’ Table 3.1% 

As expected, the genetic algorithm performed well on 
Functions 1, 3, and 4, which are linearly independent in 
Hamming espace. = The result for Function 5 indicates that 
locus dependencies between seven loci hamper a genetic 
algorithm somewhat, even though the dependent loci are 
closely linked.)  Hunecevonse2 a5 ,.0,eend 7 are adriricult 
EUnNCtLOnSseironvavgenetic algorithm: in. fact, on: the last 
three functions random search was superior. 

The genetic algorithm performed well on both Functions 
3pand 74, windicating that vaseiong was the loci “are 
independent, it makes no difference which loci on the 
chromosome are important in the function evaluation. Since 
both algorithms reached near-optimal points on Function 3, 
this function will be removed from Test Function Set I. 


Functions 1 and 4 will be retained as benchmark "easy" 
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EunceE1IONS, and> Functions 2555; .6',2 and) 7 as. functions on 
which algorithm performance can be improved. 

The variance measures for the genetic algorithm 
population on each of the functions are given in Table 3.2. 
Variance of Frequency parallels Average Maximum Frequency in 
indicating overall population variance, but displays smaller 
differentials. Variance of Maximum Frequency exhibits a 
negative correlation to Average Maximum Frequency. This 
Should be true whenever the Average Maximum Frequency 
approaches one. Average Maximum Frequency and Fixation seem 
to provide the clearest measures of population variance. 

The overall best performance of both algorithms is 
plotted against time in the graphs of Figures 3.8a through 
3.8c. In all cases except Function 7, the genetic algorithm 
had located its best value after 5000 to 7500 function 
evaluations. On Function 7 the algorithm reached a plateau 
at about 8500 evaluations. 

The Average Maximum Frequency of the genetic algorithm 
is plotted against time in Figure 3.9. On each of the 
functions, the algorithm reached a stable population of 
small variance after 6500 to 7000 evaluations. Search from 
then on would have been limited to random search by 
mutation. This explains the plateaus observed in the graphs 
of overall best performance. 

Further experiments will be terminated after 8000 
function evaluations to allow the genetic algorithm its full 


period of search by recombination. 
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Random Search vs. 
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Default Algorithm. 
Overall Best Performance after 10000 Function Evaluations 
(results averaged over 10 runs) 


Default 
Algorithm: Random Search Genetic Algorithm 
Function: 
1 69.400 99.400 
2 Sihe ay ye O00 
3 Se): 99.466 
4 O25 0 93.7635 
5 99.988 90.194 
6 923962 89.016 
7 99.860 CSS 
Source, Of Variation Maisie Det eee cat Ovolgnttrcant 
FUnCtIOn 6606.00 6 245.0 * 
Algorithm 739.00 1 24.9 * 
Function x Algorithm TsO 6 Sore * 
Within Cell 2701 63 
Algorithm x Within Cell Sie 7k 63 


Table 3.2. Experiment 1: Random Search vs. Default Algorithm. 
Population Variance for Default Algorithm 
Measured after 10000 Function Evaluations 
(results averaged over 10 runs) 
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Figure 3.8a. Experiment 1: Random Search vs. Default Algorithm. 
Overall Best Performance on Functions 1 and 2 
(o) Random Search 
(+) Default Genetic Algorithm 
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Figure 3.8b. Experiment 1: Random Search vs. Default Algorithm. 
Overall Best Performance on Functions 3 and 4 
(o) Random Search 
(+) Default Genetic Algorithm 


ye hed 297 YO OBSOVEOG() © O08 i. 


: . t ee i 
te 2 
5 , 7 
+ es : 
i so 
j i 
, Eee er Cees Ne eee 
nani he wm it : 
ctew 4447 008 “tee 
og 2 


5 ae oe al a 


+ ee be 
t i . fa 
© enews , 7 
‘ 4 rime GS =» 70 % 


| 
<f ‘ 


eee Seen eae ee ee 
poi Cite. Chee tate . COS * 
aMiiIesLeve 5 hanes, 
» 
iF 
; J - 


y ,ee 908s atinct . 1 tnecifogesd .de.t SrpEF 
ahéebeuiet Ae) bitise2 Ot Ie) sapF f* ores 

7 a rads “im See ov 
mae Huot absent sf0e8s0) (¢ 


a 
77 


_ 


7 , 
’ _ 
a 


an 
a 


a a 


100 9 02999000000000000000 


F 
uo 95 +9 
n 
. 90 retttttet+tttetet++et+t++tt+ 
1 
fe) 
n 
5 
0 2000 4000 6000 8000 10000 
Function Evaluations 
F 100 
n a 0000 
00000 
i pbbOOOPT EHH HHH ett 
il 85 
fe) 
n 
6 
0 2000 4000 6000 8000 10000 
Function Evaluations 
F 100 POC OO OOO C202 
ee epoe 
: 90 +° pepeeeteerieer 
11 85 a 
Aco 
7 


OQ 2000 4000 6000 8000 10000 


Function Evaluations 


Figure 3.8c. Experiment 1: Random Search vs. Default Algorithm. 
Overall Best Performance on Functions 5, 6, and 7 
(o) Random Search 
(+) Default Genetic Algorithm 
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Figure 3.9. Experiment 1: Random Search vs. Default Algorithm. 
Average Maximum Frequency for Default Algorithm 
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3.6 EXPERIMENT 2: POPULATION SIZE AND MUTATION RATE 

De Jong (1975) constructed several experiments to 
evaluate the parameters of population size, mutation rate, 
and crossover rate. His performance measures, however, were 
Online and offline performance, with an emphasis on online. 
For superior online performance, the random element of 
mutation was minimized. Experiment 2 was performed to test 
his results for mutation rates under the measure of overall 
best performance. Since levels of mutation needed to 
maintain population variance are closely tied to population 
size, population size was included as a factor in the 
experiment. 

Genetic algorithms with populations of 50, 100, and 200 
were run on Test Function Set I. Three mutation rates were 
used: 0.001) (De Jong’ s optimaly value)y. which: is = EOL 
n=200, 0.01, which is + for n=100, and 0.02, which is = 
for n=50. The experiment factors were: 

jie SlaenevehesWonnt of Cle Oe Cae mee ay.) 

Zo PODUlaAGLON so 1ze 1 (5070077. 200) 

3... Replications (4) 

A, ee MU ta cLone Racer (Un.0 Om Ur. 01 a Ole U2) 

The-results are given in Table 3.3. 

The two. factor interactions listed are easy to 
understand. The interaction between population size and 
mutation rate Supports the theory that the mutation rate 
needed to maintain allelic variance is inversely 


proportional to the population size (Holland, 1975). 
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Table 3.3. Experiment 2: Population Size and Mutation Rate. 
Overall Best Performance after 8000 Function Evaluations 
(results averaged over 4 runs) 


Population Mutation 
Function Size Rate 
00 SOW al) 
uf 50 100.000 100.000 92.500 
100 100.000 98.500 97.000 
200 100.000 100.000 91.000 
50 60.000 6172250 66.250 
TOO 60.000 62.500 65.000 
200 67.500 Gorn 250 6570010 
50 93 3533 98.368 98.678 
100 98.844 982333 98.306 
200 98.841 98.653 97.186 
50 89.079 96.432 9885 
100 99.999 99.990 99.998 
200 99.998 99.445 99.993 
50 89.261 92°.°136 95.043 
100 922433 Oa ae S32 0N3 
200 96.420 95%. 12:2 O57 4 
50 90.504 99.993 99.960 
100 92.180 98.391 99.910 
200 98.093 96.404 99.966 
Soupce Of Varlation MaSs Dot Dba lOrotgnl picane 
Function 6695.00 5 305-00 * 
Population Size ES 50) 2 Sy PAS) * 
Mutation Rate 35 2 Papo al 
FPuncoLOn x= POpe olze 42.16 10 Mes 
Function x Mut. Rate G63 10 geo * 
POO ecizesxeMUts. Rate 47.81 4 2.96 * 
Punction ex. Pop. Size 
x Mut. Rate L252 20 be DY 
Within Cell 21.96 54 
Materate =x within Gell Gale luo} 
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Function and mutation rate also show a significant 
interaction. A high mutation rate improved performance on 
the difficult functions, while degrading performance on the 
easy functions. Introducing more mutations pushed the 
behavior of the genetic algorithm towards that of random 
search. Performance on the damped sinusoid (Function 7) 
especially was improved, but it is clear that mutation at a 
rate of 0.02 or higher is generally undesirable. 

The significance of population size was unexpected. A 
population of 200 was superior to those of 100 and 50 in 
most cases. This may be attributed to the increased 
population variance which derives from large population 
Size. Table 3.4 shows the significant decrease in Average 
Maximum Frequency caused by increase in population size. 
Although high mutation rates also increase variance, they do 
so at the expense of the adaptive features of the algorithm. 
Apparently this deleterious effect cancels the improvement 
in performance of the corresponding increase in variance. 

The significance of population size may also be the 
result of variation between runs in the experiment. As 
population size increases, the population variance 
increases, but the expected variance between replications at 
One population size decreases. This effect may prejudice 
the analysis of variance somewhat. 

Table 3.5 charts the fixation behavior in Experiment 2. 
For 100 or more chromosomes in the population, fixation is 


rare to nonexistent. Although great differences in variance 
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Table 3.4. Experiment 2: Population Size and Mutation Rate. 
Average Maximum Frequency after 8000 Function Evaluations 
(results averaged over 4 runs) 


Population Mutation 
Function Size Rate 
00m ALO A 202 
1 50 -969 3070 5 
100 ~944 Sasi 748 
200 -336 ~819 5.0 
2 50 MTS 005 aic al 
100 -960 2050 21.85 
200 Sel oS SDS 
4 50 ~945 790 AO 
100 < S00 —qho3 05 
200 Bey lis? ~704 - 668 
5 50 ~914 185 endalee: 
100 O50 739 - 686 
200 ~119 70108 -648 
6 50 ASSL qo25 - 680 
100 ~945 hod woo 
200 ~741 SOS) 295 
i) 50 aoe - 810 a6 
100 7830 TREKS -688 
200 0 30913 2003 
Sources of) Varlatvon Moon Dee hor ae lvoe SLO Me beant 
Function SO) 5 70.40 ~ 
Population Size 210010 2 186.00 x 
Mutation Rate -54680 Z 1.3y.00 * 
Function: x) Pop. Size ~00S710 10 be WG * 
Function x Mut. Rate -OUS 6! 10 sea! * 
Pop. Size x Mut. Rate U2028 4 28.90 * 
Function: x: Pop. Size 
x Mut. Rate 0 O4e0 20 Ies56 
Within celL “U0 D26 54 
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Table 3.5. Experiment 2: Population Size and Mutation Rate, 
Fixation after 8000 Function Evaluations 
(results averaged over 4 runs) 


Population Mutation 
Function Size Rate 
« OiOi Oe Oe 
il 50 BROn Ley -050 A) 
100 - 408 -008 -0 
200 a2 50 B46) 
2 50 oie OS AO) 
100 waa 20 a0) 
200 LOO 0 oO 
4 50 -600 ot Om 20 
100 PASI a0) 70 
200 OSS - U0 =O 
5 50 Aas yA) SOY -008 
100 e205 O25 0 
200 OSS 70108 Ald) 
6 50 OPPS 067 0 
100 Sow) a0) oO) 
200 -008 alt) AIO) 
7) 50 asl dg 2033 aOulys 
100 267 025 0 
200 BES ae a0 
Source of Variation M.S Doheee bahe tt Om ot gna Team 
Function sO017 64 5 £0 0:0 * 
Population Size -00861 2 490.00 * 
Mutation Rate wO20 63 2 4:10.00 * 
Funceion ex POpe.Ssize -00678 10 3010 * 
Function x Mut. Rate ~02196 10 AE (nO) * 
Pop. Size x Mut. Rate 7Ow1 20 4 323.00 * 
Function. x Pop. Size 
x Mut. Rate =O Ti 2A7/ 20 BoD * 
Within Cell S001 76 54 


Mut. Rate x Within Cell .00189 108 


gigegr} 


m0] 
at. Tieh 
oq GEpl 
eo-*% 
Mow lk 


eek 
{6. 


aveaesuM 


nt 


Bxpeo. 


gvitn. 
tite. 
Beg 
ahs 
eahing 


* 


ue. aon 


still exist in these populations, such differences are not 
reflected by this measure. 

For overall best performance, a population size of 200 
and mutation rate of 0.001 are preferable, while a 
population of 50 and mutation rate of 0.001 are clearly 
inferraor. ~This sin no wway contradicts Der Jong's findings, as 
the online values for Experiment 2 show (Table 3.6). 

In Figure 3.10, the overall best values for the 
algorithm with population 200 and mutation rate 0.001 are 
plotted. It appears that with the possible exception of 
Function 6, the algorithm has reached a plateau by 8000 
function evaluations, Therefore, this stop point will be 


used in the next experiments as well. 


3.7 CONCLUSIONS 

Test Function Set I consists of four one-dimensional, 
multimodal functions and two 30-dimensional functions. In 
Hamming space, two of the functions are linearly independent 
(1 and 4), one is has seven closely linked dependent loci 
(5), two have distantly linked dependent loci (2 and 6), and 
one, the damped sinusoid (7); 1s dittivcula to analyze, bute 
undoubtedly nonlinear. 

The default genetic algorithm exhibited better overall 
best performance on Functions 1, 2, and 4 than random 
Search, bute was worse on Functions 5,56, and 7.) Overall 
best performance was improved at the expense of online 


performance by increasing population size to 200. 
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Table 3.6. Experiment 2: Population Size and Mutation Rate. 
Online Performance after 8000 Function Evaluations 
(results averaged over 4 runs) 


Population Mutation 
Function Size Rate 
~OO1 Ow 202 
1 50 T2760 66.51 52.90 
100 69.62 Sele te es} 49.30 
200 577205 BOanes 43767 
2 50 40.39 23760 Os 
100 Des ven IPS) re.10 8.76 
200 10.24 2245 =3.62 
4 50 Reese I) 65.14 aye io 
100 Theis see! 60.28 52205 
200 6123 Sy 426.10 
5 50 15526 66.62 5a. Od 
100 Saree 65544 C285 
100 685 Aen Bie 46.00 
6 50 Doe oe 30390 345 
100 A307 13.64 NAYES} 
200 2B) 15.96 LAOS 
a 50 80.07 TO.85 Shs 
100 /020 60.60 Siew 
200 59. 70 48.93 39.04 
Source of Variation MeSe Dobets PE -beuto sign ui deane 
Function 15900.00 5 570500 * 
Population Size pS52 200 2 192,00 a 
Mutation Rate S730 0 2 507200 * 
Function x Pop. Size Is leo) 10 Amz * 
FUNGE LON x, Mut.) cRate 579% 30 10 3409 * 
POPerolze, x MU. Rate Bi G7. 00) 4 eh aye * 
FUnecel on ex Pop... 51ze 
x Mut. Rate athens TES 20 2E05 * 
Within Celt 21289 54 
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Figure 3.10. Experiment 2: Population Size and Mutation Rate. 
Overall Best Performance for n = 200, cn = 0.001 


$949 ptec, ae 
ta tee 86. 
we ho. ’ _ 96, 


7 


Goue 0004 
Reileve us oa hicsor 
; on - i 
i moda (oy «6 
— r= ee ae ae 
a+ ® ti = Mt 
— Reta asia s a = 
@ 4 z 
o 2 4 7 
re <a = - 
6 ba s 
belle s 
a eee ee 
jets @o0a -@20b UOUl 0 
enijevisevs nomionrt 
7 
) cntgies? (6). 
2 seTssrut i) 
7 
+ wi 
| eet ee ee oe 


¢ 
OPS Goer 


CHAPTER 4 


PARENT SELECTION 


Parent selection in a genetic algorithm is usually 
accomplished by imposing a probability distribution on the 
population which reflects the fitness of each individual. 
The set of parents is then obtained by sampling from the 
distribution. Section 4.1 discusses the formation of the 
distribution. Section 4.2 studies the problem of accurate 


sampling. 


4.1 CONSTRUCTING THE PROBABILITY DISTRIBUTION 
A discrete probability distribution over n points is 


defined as a set of values Pir L=1G ee pl) pe oUCa cn at 
a 
(1) ) Pi =el/7 sand 


(2) O< p; < 1 for i=l,...,n. 
The obvious candidate for the probability of selection 


of an individual Si anesthe population is the 
f£(s;) 
individual's relative fitness, or p. = \_ “ 
’ Pj ) f(s) 
seM 
Relative fitnesses satisfy (l) above, but will satisfy (2) 
only if all fitnesses are non-negative. A fitness value may 


be any real number, so relative fitnesses might not 


COnStItuce a probability distripution. 
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In order to create a distribution which reflects 
relative fitness, it is necessary to define a base, and 


measure fitness with respect to this base, yielding 


f£(s;)-base 


Bip \ [£(s)-base]’ 


Ge f(s;) > base for i=l,...,n, these Pj will form a valid 


probability distribution. Obviously, if base = 0 the Pp; are 


Simply relative fitnesses. 

Since the range of fitnesses encountered is dependent 
on the function being optimized, a fixed base is impossible 
without a priori knowledge. Instead, the genetic algorithm 
should define a base at each generation such that 
f(s; ) DP base Lor ene “current popuilatwon.. One choice, fon a 
base is the worst fitness value in the current population. 
This would be the largest base possible. Another natural 
choice is the worst value over the last k evaluations, for 
any desired k > n. Finally, the worst value over all 
evaluations could be used. 

Let T be the total number of individuals evaluated up 
to ethe time vol creation of salvdistripution. Let se be the 
maximum number of function evaluations in the execution of 
the algorithm Crees is not necessarily known). Then the 


choices above reduce to 


bases=—oMin fh (simatetime 1) 
pa bye 
where j = max{l, T+l-k} and n< k < T_.,. The choice of k 


determines which base is employed, that is, how much of the 
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algorithm history is considered in adjusting the fitnesses. 

The space needed for storage of the worst values for 
Ku < Dee is proportional to = Evaluation of the base 
occurs after each generation of g new individuals, so the 
worst values for : Successive sets of g evaluations must be 
Saved. Li k = a eet only the worst value overall need be 
maintained. 

In Section 4.3, genetic algorithms using the extreme 
values of k = n (the base is the current population worst) 
and k = oe (the base is the all-time worst) will be 
compared. These two values for the base require minimum 
storage and computation time. The comparison should reveal 


the effects on performance of the choice of base for the 


probabimlityedictimubucvon. 


4,2 SAMPLING FROM THE DISTRIBUTION 

Once the probability distribution has been constructed, 
the set of parents is formed by sampling from the 
distribution. It is important that the sample taken reflect 
the distribution as closely as possible. If individuals of 
low utility are sampled too often, selection according to 
fitness 21S impaired. livindividuals of high utility sane 
Oversampled, premature convergence results. 

Clearly, a sampling method must be employed which is 
accurate. Many sampling methods are available in the 
lbiterature,(Kleijnen, 19745 Burt, Gaver, and Pernlas,-1970; 
Hammersley and Handscomb, 1964; Kahn and Marshall, 1953). 


Most of these define accuracy of sampling as reduction of 
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variance. Before comparing the effects of several sampling 
methods on genetic algorithm behavior, it is necessary to 
define accuracy of sampling for the parent selection 


process. 


4.2.1 Measures of Accuracy 

As in Section 4.1, let P; be the probability of 
selection of individual S5 Enpitnen pOpULatLOnve w=1\) 6.5 cpl’ 

For ease of notation, let m = n' be the size of the sample 
to be taken. Let e. be the expected number of times S; will 
occurs in) the sample. ideally, S5 should be sampled mp ; 
times, Or 

e, = mp,, i=1,...,Nn. 
Let f. be the actual frequency, or number of times, S; is 
sampled by a given method SM. Then f. is an estimator for 
ei. 

The frequency of sampling of each S5 is of major 
importance in a genetic algorithm. For this reason, 
sampling methods should be analyzed for the accuracy of 
their sampling frequency estimators, rather than for the 
usual accuracy of the mean estimator. 

The accuracy of an estimator is measured by bias and 
Valance. sh Orweach fi, Dade ses ty 

bias[f, ] = [E({f,] - e, | 
varlf,] = Elf, - Ef£,]1*. 


An accurate eStimator minimizes both bias and variance. 
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It will be hard to evaluate sampling methods if n 
estimators are measured separately for bias and variance. 
The biases of the f can be summed to give the bias of the 
method: 


n 
blaSoy = ) biaslfy], 


n 
Vala = ) var (ty). 


For a general measure of estimator accuracy, however, bias 


and variance can be combined into mean square error: 


2s Sey 
MSE[f,] = var[f.] + bias [f 5] 


i] 
ie) 
Fh 

! 
(0) 


Let the mean square error of frequencies for a sampling 


method SM be 
ally 
MSE = ) MSELE,]. 
i=l 


This mean Square error will be used to compare sampling 
methods for genetic algorithms, but bias and variance will 


be considered separately as well. 


4.2.2 Stochastic Sampling with Replacement 

The simplest way to produce a random sample from a 
probabmlity GiStribucion iseby stochastic, on Monte: Carlo, 
sampling with replacement. First, the cumulative 
distribution is computed. Each sample element is then 
obtained by generating a uniform random number between 0 and 


1 and sampling the point corresponding to this number on the 
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cumulative distribution. 

Let random() be a function generating uniform(0,1) 
numbers. Assume that the sample is to be placed in random 
order in SUIS og j-lpeee MM. Stochastic Sampling. witn 
replacement (method S) is displayed in pidgin Algol in 


Figure 4.1. 


begin method S: 
cumdist(l) = p(l) 
£Om io tr Onmze COM edo 
cumdist(a) = cumdist (1—1)) + p (1) 
enddo 
LOr jer rom slalom .do 
r = random() 
mene al, Gerdeile dh teenage ale) 
if r<cumdist(i) then breakloop endif 
enddo 
sample(j) = i 
enddo 
end method S. 


Figure 4.1. Stochastic Sampling Method 


Under stochastic sampling with replacement, f; has a 
binombal Gistributlon, of 
= = m y _— m~y = 
P(f; y) (7) Pi (1 P;) ’ BOG ay O,.-..,M. 
Then 


E(f,] = mp; 
bias[f,] = 0 


Weve te Fl) mp; (1-p;) 


n n 

etls a 

= = = ak 

MSE, > mp ; (1 P;) = ) Sealer) 
i=l i=l 


Stochastic sampling is unbiased. Its mean square error is 


composed entirely of variance. 
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4.2.3 Deterministic Sampling 

The following deterministic technique has been referred 
to as the "fixed sequencing method" by Ehrenfeld and 
Ben-Tuvia (1962) and also as "Selective sampling" by Brenner 
(1963). 

For each point i, the expected sampling frequency 
e, = mp; is computed. Each point is allotted samples 
according to the integer portion of this frequency for a 
total of W < m sample elements. The n points are then 
ordered so that the fractional parts of their expected 
frequencies are monotonically decreasing, and the first 
R = m-W points are allotted one more sample element each, to 
yield a total sample size of m. A selection sort is used to 
find thes firsteR points in order, om the assumption that 
will be reasonably small and only a partial sort will be 
needed. 

Once the number of allotments for each point in the 
distribution is known, a randomly ordered sample must be 
obtained. This can be done in two ways. 1) Sample 
stochastically, from the distribution of allotments, 
readjusting the cumulative distribution after each selection 
(Brenner, 1963). 2) Create a list of sample elements. 
Include point i x times if it has been allotted x samples. 
Choose the jth element in the final sample by generating a 
uniform random number between 1 and m+tl-j, uSing that list 
entry and moving the mt+tl-jth entry to replace the one used. 


The second technique requires m units of space for the 
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temporary array of elements. The first requires n units of 
space for the number of samples allotted to each point in 
the distribution, plus a considerable amount of time to 
compute the cumulative distribution repeatedly. For this 
reason, the second ordering technique will be used. The 


deterministic method (method D) is shown in Figure 4.2. 


begin method D: 
W = 0 
fOle see bom 1 ton. do 
expected =m * p(i) 
intexpected = truncate (expected) 
fraction(i) = expected - intexpected 
for dummy from 1 to intexpected do 
W=WHt il 
list(W) =i 
enddo 
enddo 
£00 oj LeOmeNtle to.m do 
max = l 
fOr Lr Omy2) torn do 
lf Eracti1onG)) > traction (max) then) max = l.endi£ 
enddo 
Irs6(7)) a= ax 
fraction(max) = 0 
enddo 
comment order the sample 
remaining =m 
mee, ap serecori Lhe teed wilt (ete) 
r = truncate(remaining * random()) + 1 
sample(j) = list(r) 
list(r) = list(remaining) 
remaining = remaining - l 
enddo 
end method D. 


Figure 4.2. Deterministic Sampling Method 
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MSE, reflects the bias of this method, as there is no 


variance in a non-stochastic technique. 


4.2.4 Remainder Stochastic Sampling with Replacement 

The third sampling technique is a combination of the 
first two. | Stochastic ™sampling aseno bias but, high 
variance, whereas the deterministic approach is biased but 
has no variance. Remainder stochastic sampling combines the 
desirable characteristics of both methods. 

In the remainder stochastic method, the expected 
frequencies of sampling are computed as in the deterministic 
method, and each point is allotted sample elements according 
to the integer part of this frequency. But then, instead of 
ordering the fractions, a new probability function is formed 
over the n points uSing the = as the new D;- The R 
remaining elements are sampled by the stochastic method 
according to this new function. Remainder stochastic 
sampling with replacement (method RS) is illustrated in 
Figure 4.3. 

When all e, are integers, this method is equivalent to 
the deterministic method. Since this is not usually the 
case, consider the method when R > 0. 

Examine the stochastic stage of sampling which is done 
Over the function based on the r,'S. Let 

f; oe AS 2 Meir 
where gj is the number of times i is selected during the 
stochastic sampling. gd; ise binomially distributed.) Lhus 
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begin method RS 
Ww = 0 
LOrmi eeLOnetwuntrl mn do 
expected = m * p(i) 


intexpected = truncate(expected) 
fraction(i) = expected - intexpected 
EOEn<dunmy strom 1 to intexpected do 
W=wWwW+Het 1 
LS iW) es 
enddo 
enddo 
R = m-> W 


if R = 0 then stop endif 
CORMNetL OMm2 COunh do 
fraceion (i) = fraction (2—l))\ “+ Eraction(7) 
enddo 
EOneq = ccom Wl sto=merdo 
r= RK * random () 
fOGe! SELON. sLOensdoO 
if r<fraction(i) then breakloop endif 


enddo 
MSC (4) =k 
enddo 
remaining = m 
[Or eeLeom until medo 
r = truncate(remaining * random()) +1 
sample(j) = list(r) 
list(r) = list (remaining) 
remaining = remaining - l 
enddo 


end method RS. 


Figure 4.3. Remainder Stochastic Sampling Method 
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4.2.5 Stochastic Sampling Without Replacement 

Stochastic sampling with replacement has no bias, but 
does have variance. It may be possible to reduce the 
variance of stochastic sampling by sampling without 
replacement from the expected frequencies ej: 

In sampling without replacement, a random element is 
sampled from a set and that element is then removed from the 
set before the next sample is taken. Given a probability 
distribution p, over n points, sampling without replacement 
is ordinarily accomplished by computing e, = mp, sed Ne rs om pe QR 
sampling from these eur and reducing e; by one every time 
element i is sampled. Unfortunately, e, can be reduced to 
at most zero after a sample, so if e. <p et nesanounitaok 
reduction can be at most ei: Sampling once from an element 
i which has e; < 1 can introduce bias into the method. 

Consider the simple case where n = m = 2 and 
Py > Py ? Oe lL nbt a tly, e, = 2P) > 1 and 2 2P5 Cpl. The 
expected value of f; is the sum over all samples of the 


probability that i is taken as that sample. 
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E[f,] P(l is second sample) 


| 
'O 

f= 
+ 


yes: p,P(1 Lsssecond, given lis first) 
+ p5P(1 TSesecOnd a divenezeiseLinse) 
= Py + Py (4p)-1) + (1-p,) (1) 
See jon a 
< 2p). Since Jno 8s py < ike 
E{f5] = pp + py (2p) + py (9) 
Sey or) ie) 
= Po (3-2p,) 
> 2Por since D> <<GO so: 
The method in this case is biased towards the overselection 
of the element with the smaller probability. 

Sampling without replacement can also be applied in the 
remainder stochastic method. In this case Smalse initially 
for i=l,...,n. As shown in Appendix B, this method will be 
biased unless ©. = r. for all cs and rs #7 0. The method of 


i j 


Sampling without replacement where initially e. 


; = MP; will 


almost always be biased. This can be proved for remainder 
stochastic sampling without replacement. Stochastic 
sampling without replacement will probably be biased unless 


the e; are integers LOG ala. Ms 


4.3 COMPARISONS 

The mean square errors for the deterministic (D), 
remainder stochastic with replacement (RS), and stochastic 
with replacement (S) methods are ordered 


MSE, < MSE, 


4 < MSE 


S Ss) 


(see Appendix C for proofs). Furthermore, the strict 
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Method RS has the least variance of the two unbiased methods 
S and RS. Although the two methods of stochastic and 
remainder stochastic sampling without replacement have not 
been analyzed for mean square error, they are biased. 

Brenner (1963) has shown that the deterministic method 
has the minimum mean square error of any possible sampling 
method. Therefore if bias is acceptable, deterministic 


sampling should be preferable to any of the stochastic 


methods. 


4.3.1 Experiment 3: Sampling Methods 

The mean square errors of stochastic, remainder 
stochastic and deterministic sampling are usually unequal. 
Experiment 3 was designed to show the effects of differences 
in accuracy among the three methods. Samples of sizes 
m = 50, 100, and 200 were taken from a population of n = 50 
points according to uniform random distributions with 
Variances of 0.01, 1, and 100. The factors were 

Lon Variances ofe the Drstributcion (0.015) 2, 200) 
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Ae ssanplaings MEChoda(S,. Ro ym.) 


O2 


4 
4 


23 L.f 
shotiem Eseeidne ows Sid Joye 
n 


sD VS 


bat Giiceoete 39.(8R 

ine syed snotsos fgs1 spodd fw nite’ qiventoogs oni ail 
-BeEBiG Gon Vent , ONS sistes G2 ei 264 baryisns need 
boiedin. sideininaxes sh Sisoasly Aoeie tps, (f@el) 1980538 ry 
iniigene oftenagg Yne Id 30738 ayéupe eam jemi tte ort 
tim41$3 30 ‘abdetceso8 at geld Zt szosasad? oe 


iteesoie sete yore! or sian Ad bivote pnile 


<5 
« EEloeta es oe 


sbgisah- Let IeRe tt Jnwm seqed 1.£.) + 
a 
i4bn ieee” .sldegdsass TD 2 idiay otbcpan ooem SOT, 
> 
-ledpedy gifeute ope (fre tgites Spshens meee Dink stsesi20I1e 


‘a 
asemereiklh 1 evestievads qolie 65 bahviess doa & dnemtyec ‘a 


, 7 P evn aT 
esete tc eedgdee .aholien ost? ots gnome you IvIIB ME 


i@ =< * So PoP elyGomus-Ost? agkey aver HOF tae stor 48 ie = 
it iw eneisudiads ib AovHSss qickins OF BA 4 
rue" €70306% et? .00L bee ,f ae 

(00! , f° FO 0)pntttpdiisert eas 19 Om 
| a (Ohi 

i SANs eid it ost 


tUPY : 7 sate A tied 


Results are shown in Table 4.1. The mean Square error 
of the deterministic method is far lower than that of the 
Other methods. As expected, the mean square error of method 
RS is limited by n, while that of method S grows as m grows. 
Resource requirements and relative accuracies for the three 
methods ares summarized ini Tables 4.2." ff m = 2n, as’ 1s? the 
case for parent selection sampling in a genetic algorithm, 
Space and time requirements for the algorithms are all of 


the same order of complexity. 


4.3.2 Experiment 4: Parent Selection Methods 

It remains to be seen whether the large differences in 
accuracy among these sampling methods will affect the 
performance of a genetic algorithm. In Experiment 4, six 
Sampling methods were run on Test Function Set I uSing two 
values of the base for the probability distribution. The 
algorithms employed a population size of 200, oe =—.0.0015; 
and es = 0.6. The factors were 

Peer INC CLO nem ie, pate erm pe) 

2. Sampling Method (deterministic, remainder 
stochastic without replacement, stochastic without 
replacement, remainder stochastic with 
replacement, stochastic with replacement, ranking 
method) 

3. Replications (4) 

45) Base of the Probability Distribution: (worst 
all-time, current population worst). 


The last method of selection is a ranking method proposed by 
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Table 4.1. Experiment 3: Sampling Methods. 
Mean Square Error 
(results averaged over 10 runs on populations of size 50) 


Population Sample Method 
Variance Size (m) 
Deter- Remainder 
Meni stic Stochastic Stochastic 
oul 50 TANS 45.9 5052 
100 hell 45.0 LoL. 8 
200 8.4 bl. 38 20 Oved 
HL: 50 19 47.4 54.6 
100 Shou 48.0 MOS re 
200 Bes 46.8 1.9956 
100 50 S23 Aya) 52.4 
100 4 48.9 91.58 
200 Q.2 NAY PAL 
Source of Variation Mec Deke PE Srarlior signin reant 
Population Variance ees 2 20S 
Sample Size 61333050 2 S000 * 
Sampling Method 280500.0 2 1410.00 * 
Var. x Sample Size 104.6 4 A syr 
Van. x Sampling Method WP) 4 - 06 
Size x Method 592 63.0 4 2381.00 * 
Var. x Size x Method OAS 8 -90 
Within Cell 94.2 27 
Sizer xe Wercoins Ce 2. One 54 
Method x Within Cell L595 54 
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Table 4.2. Comparison of Sampling Methods. 


Mean a 5 
Method Square Error Biased Space Time 
Worst ~ Average 
Stochastic high no n O (mn) O (mn) 
Remainder D 
Stochastic moderate no n+m O (mn) Ot nr? 
Deterministic low yes n+m O (mn) One 
ation HUNDeEMOf Pen ntsiin «stherdlstuipution, 


m sample size. 


b: worst case time assumes R 
average case time assumes 
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m-l, 
= n/2. 
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Art Wetzel (1979) which differs radically from the other 
selection methods. Each parent is selected by sampling 
twice from the distribution (stochastic sampling with 
replacement) and using the better of the two individuals. 

Table 4.3 shows the results of Experiment 4. The 
sampling method is significant, but not to a great degree. 
Remainder stochastic sampling (with or without replacement) 
is usually inftervor. "Deterministic Sampling and Stochastic 
sampling without replacement are often better than the 
others. VLt 1S interesting to note that both of .these 
methods are biased. Apparently low mean Square error is 
preferable to lack of bias in the sampling process. 

Preliminary experiments indicated that the choice of 
base would also be significant. Those experiments were run 
on populations of 50 individuals for 200 generations. Using 
the current population worst as the base hastened 
convergence considerably, often to a sub-optimal value. 
Apparently over a larger population, with a run length of 
only 40 generations, the choice of base is less important, 
because the difference between all-time worst and current 
population worst values is small. 

The effects of sampling method and distribution base on 
population variance are shown in Table 4.4. The all-time 
worst value base maintained population variance 
significantly more than the base of the current population 
worst. Sampling by the deterministic method and stochastic 


sampling without replacement also maintained population 
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Table 4.3. Experiment 4: 


Parent Selection Methods. 


Overall Best Performance after 8000 Function Evaluations 
(results averaged over 4 runs) 


Func— 


tion Base® Sampling Method? 
D RSW SW RS S RA 
1 A O70 LUO OF 10'0%20'0 94500 100.00 100.00 
C LOD OCR LOU CUS  LOUR COs Sh00R 00 LO0200 00.00 
2 A Sis Sis) Galerz.5 6750 Dion 707,00 63.75 
© OWie 50 Doren) O 65.00 eg AS) Gee) 657.00 
4 A hers vs) Nek FAT) Moby SA O51 4 98,76 99°76 
G 99.74 99%,,05 IS Ts) 98.91 O94) 99.84 
3) A 100.00 98261) 00r 00 O97. Oi, ORO 97.30 
G LOOR(0'0 977..0'3 O94 OW 35 JOO 000700 
6 A O22 94.51 Messe) 89.45 94.09 30795 
(e 96r3 5 O22 ayy 48) 88.84 O03 90.61 
i A 100.00 D6r2 6 98.34 94.49 9653 93755 
C Be 5 US, 93648 99% 04 93.33 OBR 03 90 18 
aca base! Ofmther pEObabi ii ty, dustribmtion 
A = all-time worst value 
€ = current population worst 
b: sampling methods 
D = deterministic 
RSW = remainder stochastic without replacement 
SW = stochastic without replacement 
RS = remainder stochastic with replacement 
S = remainder stochastic with replacement 


RA = ranking method 


Source of Variation 


Function 

Sampling Method 
Distribution Base 
Function x Method 
Function x Base 


MisSi. 


8851.00 


Sampling Method x Base 
Function x Method x Base 


Within Cell 
Base x Within Cell 
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Table 4.4. Experiment 4: 


Parent Selection Methods. 


Average Maximum Frequency after 8000 Function Evaluations 
(results averaged over 4 runs) 


Func-— 3 b 
tion Base Sampling Method 
D RSW SW RS S RA 
1 A -838 ~964 ~904 soo2 882 2999 
C 2299 2-999 -999 he ke ~999 999 
2 A «879 946 ane or & 2892 980 
C ~970 a Fhe ~967 986 wee 988 
+ A ~728 a Le ¥ Oy be -980 ~/784 «999 
Cc -835 2945 ~829 “969 2879 ~2o9 
5 A -718 ~834 25 ~934 At63 Se 
C -760 ~848 Ee PAY ee Jere: -802 saad 
6 A 786 BL pel) ~681 -946 - 788 be AE 
& - 196 -978 -865 mie he ds - 850 <o99 
7 A ~Ia6 -867 ~ t62 ~958 Cis 2 <2FS 
c me Sy i 866 ~744 ~934 Dod oF WE =959 


a: base of the probability distribution 


A 
C 


all-time worst value 


oi 


b: sampling methods 


current population worst 


ic without replacement 


with replacement 


D = deterministic 

RSW = remainder stochast 

SW = stochastic without replacemet 

RS = remainder stochastic with replacement 
S = remainder stochastic 

RA = ranking method 


Source of Variation M.S. D.F. F-ratio Significant 
Function - 14580 5 88.80 * 
Sampling Method - 25820 5 L57...00 * 

Base ~12500 1 97.10 * 
Function x Method 01264 25 7.69 * 
Function x Base 208721 = 6.78 * 
Sampling Method x Base -01200 fs) = ee * 
Function x Method x Base .00290 tas 2.26 * 
Within Cell -00164 108 

Base x Within Cell -00129 108 
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variance, which no doubt contributed to their superior 
Overall best performance. The deterministic method, using 
the all-time worst as base, provided consistently lower 
average maximum frequency than all other selection methods 
tested. 

To complete the smelwsik,, the overall best performance 
of the two best sampling methods was plotted over time (see 
Figures 4.4a and 4.4b). Each point was averaged over four 
runs uSing the all-time worst value as base. With the 
exceptions of Functions 5 and 7, the early performances of 
the two algorithms were identical. On Functions 5 and 7, 
the differences were small and short-lived. This means that 
the greater average maximum frequency of stochastic sampling 
without replacement was not a symptom of faster convergence 
to a good value. Since overall best performance of the two 
methods iS Similar and deterministic sampling provides more 
population variance, the deterministic method of sampling 


will be employed in the remaining experiments. 


4.4 CONCLUSIONS 
The parent selection probability distribution can be 


constructed. according to 


f£(s,)-base 


cin i Perey besele 


The use of all-time worst value as base is best at 


maintaining population variance. 
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The Sampling Of parents according to this distribution 
must be accurate. Remainder stochastic sampling with 
replacement has the lowest variance for an unbiased method. 
Deterministic sampling, although biased, provides the 
greatest accuracy in terms of mean square error of sampling 
frequencies. Deterministic sampling yields greater 
population variance than the other methods when employed in 
a genetic algorithm. The deterministic and stochastic 
without replacement methods are good with respect to overall 
best performance, although choice of sampling method alone 


did not affect performance greatly. 
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Figure 4.4a. 


Experiment 4: 
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Figure 4.4b. Experiment 4: Parent Selection Methods. 
Overall Best Performance on Functions 5, 6, and 7 
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CHAPTER 5 


DIPLOIDY AND DOMINANCE 


This chapter investigates diploidy and several 
dominance schemes aS mechanisms for reducing loss of 
population variance. First, the natural systems involving 
diploidy will be examined, and then a set of dominance 


schemes for genetic algorithms will be proposed. 


5.1 NATURAL DIPLOIDY 

Diploid organisms have chromosomes in homologous pairs. 
There may be many such pairs in a higher organism. The 
following discussion assumes organisms with only one 
homologous pair; however, the concepts are easily 
generalized to multiple pairs. 

Avdiploid organism is homozygous for an allele ay ata 


locus sit othe chromosomes: contain a. fat tthat slocus. If the 


0 
two chromosomes contain different alleles at the locus, the 
organism is heterozygous. Often in a heterozygous 
individual, the phenotype will be determined uSing only one 
of the alleles at a locus. This allele is said to be 


dominant over the allele not used. That allele is called 


recessive. 


5nd lL The Reproductive (Cyicie 
For haploid organisms, the cycle of reproduction is 


fairly simple. An organism produces an exact copy of itself 
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via mitotic division. This copy may then recombine with the 
copy of another parent (see Figure 5.la). Diploid organisms 
undergo meiotic division (Figure 5.lb) followed by zygote 
formation. In meiotic division, both of the homologous 
chromosomes are duplicated. Crossover occurs among the four 
chromosomes in the parent. A zygote (genetically complete 
diploid organism) is then formed by pairing one of the 
chromosomes with a chromosome from another parent. The 
offspring always contains genetic information from both 


Dabents. 


5.1.2 Recombination 

After duplication there are four chromosomes in a 
parent. Of these, two rarely undergo crossover, while the 
other two recombine with a frequency proportional to 
chromosome length. If the probability of crossover between 
the middle two chromosomes is Poe then the probability that 
the one chromosome in the offspring is the result of 
crossover is 0.5°Pi. The meiotic cycle can be simulated by 
using a crossover probability of Bye = 0.5°Pl, Lor 
O< te 0.5, sands omrt rings ther duplications step 
(Pigure S.C). 

In haploid populations, crossover is the main mechanism 
for recombining alleles in the genotype. In diploid 
populations, however, zygote formation recombines 
chromosomes to create new genotypes. Consider a haploid 
population with crossover rate Le and no mutation. The 


probability that an of€spring  i1svidentical to one Of its 
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parents is equal to the probability of no crossover plus the 


probability that crossover occurs but the parents are 


identical on one side of the crossover point. 
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Now consider a diploid population containing copies of 
unique chromosomes Cie i=l¥,s+2,7k eeASsumeathate there ds no 
crossover or mutation. Define the probability of no visible 
recombination as the probability that an offspring is an 
exaceEscopy Of FONne VOR 1ESsparents. 

Suppose a parent having chromosomes Cy and Co mates 
with a parent having chromosomes C3 and Cys ff the 
offspring receives chromosomes Cy and C3, for example, it 
Wilby belawcopye0r Oneepanrenty1£ oT mee ug Co = C3. In 
general, 

Pai ploia sh) - 2P(c; = Ca)s 
for any two chromosomes Cie C. sampled from the population. 


If the chromosomes are distributed randomly throughout the 


population and the individuals mate randomly, this becomes 


k 

ee a 2 

Paiptoia (NeV-R+) = 2 eC) ; 
i=l 


=n PORS1= 0, tee hyp penawers, ~.that then k 


Assume that p(C;) k 

chromosome types occur in equal numbers. Then 
Ee (NoVeR.e) = a 
Gahenkouiely evar ee 


If the number of chromosome types in the population is 


Greater than) five, on k > 5, 
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when Ey Lon a haplordepopulation 15.0.6. 


(N.V.R.), 


hse p(C,) # + LOM ob= leet, hip aCe che mpl Oba Tey sor eno 
visible recombination will increase. But unless the 
population has very little variety in its chromosomes, 


fertilization alone will provide as much recombining as 
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crossover provides in a haploid population. 

It remains desirable to recombine via crossover in 
diploid populations. However, high crossover rates 
(Pe > 0.5) for the simulated reproductive cycle are not 
needed for effective recombination, and are not found in 
natural diploid organisms. Drosophila Melanogaster (a fruit 
Ely) and Neurospora Crassag(arvfungus) sexhibit crossover in 
5% to 50% of their offspring chromosomes (Strickberger, 
1976). Only very short chromosomes show less than 30% 


crossover. It would seem reasonable to place Be between 0.6 


andei. 0, syreldmgal.on< PL < 0-5. 


Dees NUtarlon 

The crossover rate needed for recombination in diploid 
Organisms is lower than that needed in haploid organisms. 
The mutation rate needed to prevent the loss of alleles from 
the population is also smaller under diploidy with 
dominance... the tollowing: 1S a, modiircation of the 
presentation in Holland (1975). 

Let the alleles possible at one locus be Ayrayreserape 
Assume that ay is the least fit of these on the average. 
Let ph(ay,t) be the proportion of the population at time t 


which displays a, in the phenotype. Let the rate of 


0 
reproduction of ay be l-e for some 0 < e< l. This means 
that at time t+l in a population of size n, it is expected 
that 2n(l-e)ph(a,,t) parents will be selected which have ay 
employed in the phenotype. 
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Let p(ay,-t) be the proportion of the chromosomes in the 
populations ate times t- which contain ays To guarantee that ay 
never disappears from the population, a mutation rate is 


needed such that if P(ay-t) = *, then p(ap,ttl) 23 : where 


N’ 
N is the total number of chromosomes in the population. The 
expected number of ay's in’ the population= at time ttl. is 
egual to the number which survive evaluation in the 
phenotype, plus the number which are masked in other 
phenotypes, plus the number which mutate to age minus the 
number which mutate away from ays The expected number of 
ay's in the population is computed as 
N°P(a,,t+l) = n* (l-e) *ph(a)t) * (number of ays in each 
organism with a phenotype using ay) 
+ n* (average reproductive rate for 
alleles not ay) * (proportion OF 
genotypes containing masked 
ay 'S)* (number OF ag's masked) 
Nee le pian, ci 
— N°P *plagrt)- 
ineashaploi1d population, Nr r= sn, P(ayrt) = ph(aj,t), and no 
masked alleles are carried in a genotype. Therefore 


n*p(ap,ttl) = n(l-e)p(a,t) + nP (1 - pP(a ,t)) 


NPP lagrt) 


Pp(ay,t+l) = (l-e)p(ay,t) + Pat = 2p(ay,t))- 
-i ee 
he P(ay,t) = then p(a,,t+l) 2 nN ge 
peace 
m— n-2° 


DE ay has a very low average fitness, it will have a 
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reproductive rateror 07,01 e will sequal 1.° The mutation 
rate needed to insure the presence of very bad alleles under 
haploidy is 


ae Oe 
ig) S 


n= 
In a diploid population, N = 2n. Assume that the 
alleles are uniformly distributed over the population. Let 
Ay ragreee rap each be dominant over ays Then ag will be used 


in the phenotype whenever the individual is homozygous for 


2 


The proportion of phenotypes heterozygous for a, will be 


0 
2p (ay,t) [1 oe p(ap,t)]. 
is 


The average reproductive rate for alleles other than ay 


ph(a,,t) 
I-ph(a,,t) >" 
9 § at time €+1 1s 
2-n*p(a),ttl) = n(1-e)p (ay rt) * (2) 


L + -e( 


The expected number of a 
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I=pian 7c) 
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5.1.4 A Diploid Reproductive Plan 
Diploid organisms employ a different reproductive cycle 
than the cycle of haploid organisms. Mitosis is replaced by 
meiosis and fertilization. The observed crossover rate for 
daplord organiscme 1s 1.0 -o8to 0.59, compared to a = 0.6 in 
previous haploid genetic algorithms. The mutation rate 
needed to maintain alleles in the population is on the order 
of =, whereas it is _ for haploid populations. 
. The following diploid Reproductive Plan will be used 
for experiments in Chapter 5: 
1. 100 diploid organisms are generated according 
to a uniform random distribution over the set 
of possible genotypes. 
2. 200 parents are selected by deterministic 
sampling over a distribution reflecting the 
utility of the phenotypes. 
3. 100 parental groups are formed by random 
pairing. Single crossovers occur between the 
two chromosomes in each parent with a 
Euequency 073" < Epes Os. Mutation occurs “at 
each locus with a frequency 


P = = = 0.000025. One chromosome is chosen 


m 
at ee from each parent, forming one 
diploid offspring’ per parental group. 

4, The old population is completely replaced by 
thes 100) offspring. 


5208 Thescycle repeats from step 2. 
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5.2 DOMINANCE SCHEMES 

A dominance scheme maps two alleles at a locus onto one 
value to be used in determining the phenotype. The benefits 
of diploidy in avoiding loss of population variance would be 
greatest if the poorer alleles were completely recessive. 
Such deleterious alleles would be masked in heterozygotes, 
and thus preserved in the population. If an inferior allele 
waS dominant, it would probably disappear quickly from the 
population. 

There is no evidence that in natural diploid organisms 
better alleles) are intrinsically dominant... In fact, it is 
not known whether or not there is any relationship between 
average allele fitness and dominance. Although dominance 
has been studied by population geneticists (e.g. Watterson, 
Si Zee POrdsangeoneppard, 965s Chark el 264 sCrosby 71963), 
no firm conclusions have been drawn about its development or 
utility. However, it is possible to examine natural 
dominance mechanisms and construct hypotheses about possible 
relationships to fitness, 

Consider a locus in a diploid organization with alleles 
{agra}. Assume that this locus specifies a simple trait or 
phenotype. The values of + and - for the trait are produced 
by the presence and absence, respectively, of a gene product 
in the organism. Assume that allele ay codes for a 
sufficient amount of this product, while ag codes for 
nothing. Then the phenotypic values for the three possible 


genotypes will be: 
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ay is dominant over ag: 

In a genetic algorithm, dominance by presence of a 
product can be simulated by a random, fixed, global 
dominance map. Initially, a chromosomal map is generated 
randomly to indicate which of the alleles at each locus is 
the producer, or dominant allele. This map applies globally 
to all individuals in the population and remains fixed 
Ehroughout time. 

In natural organisms, there are mechanisms for the 
dynamic development of the genotype (Section 2.3.4), 
including deletion mechanisms. An organism in which + was 
disadvantageous would be improved by the deletion of the 
Vocus. ©There may “be a greater proportion of loci in natural 
populations in which + is advantageous, due to the deletion 
Of loci producing deleterious’ products. 8Thus, despite the 
lack of an explicit mechanism which relates dominance and 
fitness, there may be a correlation between dominance and 
fitness in populations which have developed adaptively. 

Since genetic algorithms with binary representations 
have no deletion operator, the natural mechanism of deletion 
cannot be modelled explicitly. Rather, the relationship 
between dominance and allele value caused by deletions must 
be simulated. The correspondence is simple: the better of 


two alleles becomes dominant. 
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The value of an allele is proportional to its frequency 
of occurrence in the population. Therefore, a variable, 
global dominance map may be used. At each generation, the 
probability of an allele being dominant is equal to its 
frequency of occurrence in the previous generation. Every 
time a heterozygote is created, its phenotype is determined 
stochastically according to these probabilities. The 
Probabilities applyugiobally to tthe individuals in? the 
population, but vary from generation to generation. 

A less expenSive variant on the stochastic, variable, 
globalsmapyis agdeterministic,;@vartable, global map. = At 
each locus, the allele with the greatest frequency in the 
previous generation is declared to be dominant. A new 
dominance map is constructed at each generation, but the 
determination of dominance for each heterozygote is no 
longer sprobabilistic. 

The model of two alleles defining the presence or 
absence of a product can be generalized to two alleles, each 
producing ay product. VLet ay code for product A and ay code 


EOQESESweOneeor AL,Ouecbemaysbesnull..@nlErthere mis: no-control 


of expression of the products, the phenotypes will be 


a, ay --> 2A 
A149 --> A+B 
aay --> 2B 


Thene els no dominance Inwthis case. Te differences in the 
quantiltysof a product. domot produce duiierences inthe 


phenotypic trait, the result is codominance: 
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Lack of dominance and codominance produce three unique 
phenotypic values for the three possible genotypes. 

One of the advantages to diploidy may derive from the 
production of heterozygotes superior to either homozygote. 
This also would require three possible values for the trait 
in the phenotype. 

In a genetic algorithm used for function optimization, 
the dominance map has a range of {0,1}. The two alleles 
present at homologous loci must always map to either 0 or l. 
Therefore codominance and heterozygote superiority cannot be 
investigated using a genetic algorithm, without major 
changes in the representation. The two product model can be 
Simulated, however, if there is some control of gene 
expression. 

It has been observed that chromosome segments sometimes 
become heterochromatic, meaning they condense and appear 
very dark when stained and viewed under a light microscope. 
In this state, the segments seem to form no gene products. 
In mammalian females, one of the two X (sex) chromosomes, 
apparently selected at random, undergoes 
heterochromatization (Hood, Wilson, and Wood, 1975). The 
entire chromosome becomes inactive, leaving the second 
chromosome to provide any products from the X chromosome 


genes. 
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The phenomenon of heterochromatization is modelled by 
dominance of a random chromosome. One chromosome is 
selected at random and used to determine the phenotype, as 
if the second chromosome did not exist. 

There iS no reason to suppose that heterochromatization 
is at all related to chromosome fitness. However, it is 
tempting to consider what the performance of a genetic 
algorithm would be if in all cases the chromosome with 
higher fitness value dominated. Dominance of the better 
chromosome will be included in dominance comparisons, even 
though it has not been observed in natural systems. 

The goal of a dominance scheme in a genetic algorithm 
is to protect inferior alleles by making them recessive. 

The construction of a map which benefits an organism in this 
manner may best be achieved by allowing the genetic 
algorithm to develop dominance maps dynamically. Each 
individual in the population carries a third chromosome 
which acts during evaluation as the dominance map for that 
individual. During the reproductive cycle, this chromosome 
behaves like a haploid organism, recombining with the 
dominance chromosome of the second parent during mating. It 
mutates with the same frequency as the homologous 
chromosomes. Offspring creation for organisms with 
individual dominance maps vis illustrated In Figure 3.2. 


Good dominance maps should develop in parallel with good 


organisms. 
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FPigune sc. Ornspring Creation .£on Diplords. with 
Individual Dominance Maps 


The study of possible models for dominance has revealed 
six dominance schemes which may be useful for genetic 
algorithms: 

fixed, global map, 

stochastic, variable, global map, 

deterministic, variable, global map, 

random chromosome, 

better chromosome, 

individual maps. 


These will be compared in the next section. 
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5-3 COMPARISONS 


5.3.1 Experiment 5: Dominance Schemes 

In Experiment 5, the six dominance schemes were run on 
TestwPunction, Set. 1 with chossover@probabalities: of 0.3 and 
0.5. The factors were 

I (Bunctton@tl,. 27.47, 5,90, e0) 

2. Dominance Scheme (random chromosome, better 
chromosome, fixed map, stochastic map, 
deterministic map, individual maps) 

3) Replacations (4) 

4. sCtLOssoversRate.) (023, 025). 

The overall best performance is shown in Table 5.l. A 
CLOSSOVEE Late of 055 1S Superior to 03) two: thurds of the 
time. A smaller rate is preferable for a stochastic, 
variable, global dominance map, which leads to the crossover 
rate/dominance scheme interaction. 

The interaction between dominance schemes and functions 
is difficult to analyze. Random chromosome dominance 
performed well on the harder functions, especially 
Function 7, while stochastic dominance maps did well on the 
easier functions, 1 and 4. Better chromosome dominance 
displayed poor performance on the easy functions. The other 
schemes had similar performances on all functions. 

Average Maximum Frequencies are shown in Table 5.2. 
Heresa crossover eraterot 0 Sawas clearly superion to 0.3. 


The differences were especially noticeable on Function 6. 
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Table 5.2. Experiment 5: Dominance Schemes. 
Average Maximum Frequency after 8000 Function Evaluations 
(results averaged over 4 runs) 
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Better chromosome dominance and stochastic maps 
provided the greatest population variance. Random 
chromosome waS worst in this respect. The selection of 
inferior alleles as recessives did preserve population 
variance significantly better than randomly chosen recessive 
alleles. 

The dynamic development of dominance maps, illustrated 
by the individual maps dominance scheme, was not 
Particularly effective. Cavicchio (1970) believed that 
inversion was not useful in genetic algorithms because the 
population sizes and run lengths were too small for such a 
subtle operator to show any effect. The dynamic development 
of dominance maps may be similar to inversion in that 
respect. The performance of the individual dominance maps 
scheme waS on a par with the deterministic, variable, global 
dominance maps scheme, but required 50% more space for the 
dominance chromosomes. 

To summarize, the additional space required for 
individual dominance maps is not justified in view of the 
unexceptional performance of the individual maps scheme. 
Random chromosome dominance performs unevenly and has poor 
variance properties. Better chromosome dominance is 
ineffective on easy functions. The stochastic map scheme 
does poorly on hard functions. Fixed, global dominance maps 
and deterministic, variable, global dominance maps yield 
even performance and medium levels of population variance 


relative to the other dominance schemes. Only these two 


to ool iaelee ont 
acttaingee ¢ 


aviseesst epada yive 2 tested ¥ . 
; 4 fa 
es os ie - 

nefarjeBeth) Geyas arHen,s eager: - 


odo soe ,enedons 
sere Bowsliied (GVetl ob ieee aay 
oa geyroed enrdtodp-s x a.sehe fity 032k. gon asw aole7s 
eases ahY Licmayded Some al oT Gow west noleet “al 
inser leted siadavS ad pole wpe of 1038380 ; 
fais nr Aopeyeent OF Ie ed yet eqem esosaimeab 
aga conba eee Wesley foae sag) ined sit 
lectin ,=Jeetwéev + cwelabaradab ald dave seg & fo anes 


sts\ go} ossae 576m 462% Sosbupes ese easter aqan ee 
» ” Ydemmeveosts egn 

15 bol aped edace Seaeed tide el’  oybusmere or 

ati bo waty ni -battiteuc Jon #4 bqaq gong yimot tsubivt 
2nroe, eh ae igutéivi6ti sae to e2asmrveBaeq tenorzgeo 
ood ese howe yinerea empiciveg syonniaoh suozvoneids: @ 
at ae mgmt sporommte twiied wets eenae j 

onedge gam 229epdaege ad = .2n0iso0ud. yous, Me) 
sat Spivecsuiad igdetp ,fexlt = .enoi yon? 7: ohio o 
bfexk ecew sonenbagh.isdoie .<itelasy., bos 
exéeiusvy saoisetogaq fe eteves re: ABs rye 2 i jeg nev: 
aur eants yiat saowatoe a 


schemes will be considered further. 


5.3.2 Experiment 6: Dominance Change Operators 

Holland (1975) believed that diploid organisms would 
prove most useful to genetic algorithms if there was an 
Operator for dominance inversion. Those rare alleles which 
had been masked as recessives would occasionally become 
dominant and undergo evaluation in the environment. One 
possible dominance change operator for a fixed, global 
dominance map would require the generation of a new, random 
map. Dominance change could also be defined as inversion of 
the current map. A fixed, global map would be replaced by a 
map with all O*s changed to 1s and vice versa. For a 
variable, global map, dominance inversion would consist of 
assigning dominance to the less frequent allele in the 
previous generation. 

Experiment 6 tested the fixed map and deterministic, 
variable map dominance schemes against the same schemes with 
periodic dominance changes. For the fixed map scheme, a new 
random map was generated every 2000 function evaluations. 
For the deterministic, variable map scheme, dominance 
inversion was employed. Dominance switched to the less 
frequent allele for the period between 2000 and 3000 
function evaluations and that between 5000 and 6000 function 


evaluations. The crossover rate for all algorithms was 0.5. 
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The factors for the experiment were: 

eR UTIC tO riam(@le,mmc uae a0),mtO!;ae7s) 

2. Replications) (6) 

3. Dominance Scheme (fixed map, deterministic map) 

4. Dominance Change Operator (off, on) 

The overall best performance measures, shown in 
Table 5.3, require little explanation. The dominance change 
operators made a significant difference, often improving 
algorithm performance. The variances of the populations in 
the experiment came aS a Surprise (Table 5.4). The use of 
dominance change operators sometimes produced a loss of 
variance. An examination of the behavior of the algorithms 
over time helps to explain why. 

Average Maximum Frequencies for the deterministic map 
scheme with and without dominance change operators are 
plotted in Figures 5.3a and 5.3b. Overall best performance 
for the same algorithms are shown in Figures 5.4a and 5.4b. 
These are Similar to the plots for the fixed map dominance 
scheme. 

Assume that after 2000 function evaluations, the 
recessives at each locus were mostly the less fit alleles. 
Reversing the dominance at that point would have subjected 
these poorer alleles to frequent testing in the environment, 
resulting in a rapid reduction in their numbers. At the 
same time, the dominance reversal would have created many 
new phenotypes. This would have yielded some improvement in 


algorithm performance. This is what can be observed on the 
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Table 5.3. Experiment 6: Dominance Change Operators. 
Overall Best Performance after 8000 Function Evaluations 


(results averaged over 6 runs) 
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Table 5.4. Experiment 6: Dominance Change Operators. 
Average Maximum Frequency after 8000 Function Evaluations 
(results averaged over 6 runs) 
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Figure 5.3a. Experiment 6: Dominance Change Operators. 
Average Maximum Frequency, Dominance = Deterministic Map 
(o) Dominance Change Operator Off 
(+) Dominance Change Operator On 
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Figure 5.3b. Experiment 6: Dominance Change Operators. 
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Figure 5.4a. Experiment 6: Dominance Change Operators. 
Overall Best Performance, Dominance = Deterministic Map 
(o) Dominance Change Operator Off 
(+) Dominance Change Operator On 
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Figure 5.4b. Experiment 6: Dominance Change Operators. 
Overall Best Performance, Dominance = Deterministic Map 
(o) Dominance Change Operator Off 
(+) Dominance Change Operator On 
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easier functions, especially Function l. 

Now assume that there was no initial advantage to 
either allele at many of the loci, so that dominance drifted 
randomly to one allele or the other. Reversing the 
dominance after 2000 function evaluations would have 
permitted many marginally better (but previously recessive) 
alleles to increase their numbers in the population. The 
result would have been immediate improvements in performance 
and very little loss of “variance. “This"is what can be 
observed on the harder functions, most notably on 
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5. oe Se EXpehimMentay. Haploldy vs.) Diploidy 

Experiment 7 was designed to compare the performance of 
haploid and diploid algorithms on function optimization 
problems. Haploidy was represented by an algorithm with 
population 200; smucatvon rate 0.001 and crossover rate 0.6. 
An algorithm using the deterministic map dominance scheme 
with dominance change operator was selected as a reasonable 
representative of the diploid algorithms. Population size 
was 100, so that space requirements for the two algorithms, 
measured as number of chromosomes in the population, were 
equal. The mutation rate for the diploid algorithm was 
0.000025; the crossover rate was 0.5. Both algorithms used 
deterministic parent selection and the all-time worst value 
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As before, runs were terminated after 8000 function 
evaluations. Thus the haploid algorithm ran for 40 
generations, etnesdiploldmalgorithm  toresc0.) shactorsein: the 
experiment were 

ee SEUNG Ee VOm (ly 2a, (ol Ol, as) 

2. Atgorithm (haploid; diploid) 

3. Replications (10) 

Table 5.5 gives performance results for Experiment 7. 
Diploidy did not improve overall best performance. Best 
performance was plotted over time (Figures 5.5a and 5.5b), 
to determine whether the diploid algorithm had had time to 
Fulfill 12ts potential for searches, Lt appears, that on all 
functions, both algorithms had reached plateaus of 
performance by 7000 function evaluations. 

Average Maximum Frequencies for the two algorithms are 
shown in Table 5.6. “lhere,waseas small tunction/algorithm 
interaction, but in general diploidy with the dominance 


change operator did not increase population variance. 


5.3.4 Experiment 8: Diploidy for Limited Resources 

The good early performance of the diploid algorithm in 
Experiment 7 suggests another possible application of 
diploidy: to problems with severe limitations on time and 
Space. Experiment 8 compared a diploid and a haploid 
algorithm over Test Function@cet: lator runs orsonly. 2000 
function evaluations. Population sizes for both algorithms 
were limited to 20 individuals. The early performance of 


the diploid algorithm in Experiment 7 did not depend on the 
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Table .5.5..o%pebiment J: Haploldy tvs ss Diploldy. 
Overall Best Performance after 8000 Function Evaluations 
(results averaged over 10 runs) 


Algorithm: 
Function: 


i 
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algorithms: 


Haploid: 


Dip loud: 


Source Of Variation 


Function 
Algorithm 


Function x Algorithm 


Within Cell 


Haploid Diploid 
9330 100.00 
68.00 65.00 
98.94 7235 
Wey 99.94 
286 OZ 92.65 
SO 97. 86 


parent selection = deterministic 


population size = 200 

crossover rate = 0.6 

Mutation rate = 0.001 

ploidy = 1 

parent selection = deterministic 
population size = 100 

crossover rate = 0.5 

Mutation rate = 0.000025 

ploidy = 2 

dominance scheme = deterministic map 
dominance change operator = on 
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Figure 5.5a. Experiment 7: Haploidy vs. Diploidy. 
Overall Best Performance on Functions 1, 2, and 4 
(o) Haploid Algorithm 
(+) Diploid Algorithm 
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Overall Best Performance on Functions 5, 6, and 7 
(o) Haploid Algorithm 
(+) Diploid Algorithm 
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Table. 5.6. Bxperiment 72 Haploidy vs. Diploidy. 


Average Maximum Frequency after 8000 Function Evaluations 


Algorithm: Haploid Diploid 
huneeion: 

1 2832 - 840 

2 2390 Oi 

4 2/7 3.0 7 60 

5 20 sol 

6 hos - 780 

7 a Whe) 7138 


(results averaged over 10 runs) 


algorithms: 


Haploid: parent selection = deterministic 


crossover rate = 0.6 
Mutation rate = 0.001 
ploidy = l 
Diploid: parent selection = deterministic 
population size = 100 
crossover rate = 0.5 
Mutation rate = 0.000025 
ploidy = 2 
dominance scheme = deterministic map 


Source of Variation M.Si Dil ee batlOosolgnimreant 
Function pOw asd 5 54.00 * 
Algorithm -00520 rh BA Ths) 

Function x Algorithm -00348 5 Zio S * 
Within Cell OT sks 108 


population size = 200 


dominance change operator = 
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dominance change operator, so for Experiment 8 the diploid 
algorithm used the deterministic map dominance scheme with 
no dominance change operator. Other algorithm parameters 

remained unaltered from Experiment 7. The factors were: 

Ike SiisvelaWeye) lk 245 <2 Sowa a a eh) 

2. wAlgouuchnm (haploid, diploid) 

36.) ReEDlcatronsn (10) 

Overall best performance after 2000 function 
evaluations is given in Table 5.7. Plots of overall best 
abe ‘shown in Figures 5.6a and! 5.6b. The diploid algorithm 
had a small advantage on Function 5. In general, however, 
there was no Significant difference between haploidy and 
diploidy, despite the fact that the diploid algorithm was 
allowed twice as many chromosomes in its population as the 


haploid algonirthm. 


524 DIPLOIDY IN RETROSPECT 


Thes failure (ot tdiploidy covproducelisignificant 


improvements in genetic algorithms for function optimization 


can be better understood in light of current theories in 


population genetics concerning diploidy. Diploidy and 


recombination have been popular subjects in the literature, 


beginning wlin Fisher an 1958 and continuing to vthe present. 


(See Maynard Smith, 1978, Crow and Kimura, 1970, Ewens, 


1963, and Fisher, 1958, forwgeneraltreatments. Li, 19738, 


and Felsenstein, 1974, are representative of research papers 


on recombination.) In his book, The Evolution of Sex 


(1978), Maynard Smith makes several relevant points. 


wh) 5) 


noisonw) ICS een oa teed Keazon® , 
tend Lists Ww t0014 oe 
mixloopgos Biotyib. sit de bak m2 doaawi nt ndae vat - 
saveued ,iaasiop AL .«@ 175k ae suetoavbe [isms 6 1 - 
bis vibodaul-nsawten sors cata guleaa tonete on @ew axsds” 
pet giulropis biotgks..ens deuts doed sat Regtery: syblolatb 
on) <4 secwanarhite ase cit earmamnids ren be gold bevatts ) 

| ao thai opts blolged 


Selec wi Yaroueta sé 

sired Siow Ancheig es quieledd %¢ espits? sdT a 
sotausiuidge cobronugt  s08- éadsiiorts gigenesp sl ejaamevorqes 
ne ewet met jautsee 30 Yo adest ¢) poosarsbaw angied ed nso ; 


ine sybrl cokeraet <qhiaigath pasnagones epi denee oe 


—*fixazorgy @4¢ OF striae bes 372i nit r00eFt A191 
sanewd OTe emai bos wor? ,ofer ~ddiae t 
BVOL Vid -sdaonsbeat | Lntonse 30% peel ¢ ne 
enoqee dosasens Jo svisndmeasrgey o72 , TEE 


493 Jo noiteloy’ pat ,x00d et a 
veintog tosvelay Isisvex osdem mi, 


. 


al) 


Table 5.7. Experiment 8: Diploidy for Limited Resources. 
Overall Best Performance after 2000 Function Evaluations 
(results averaged over 10 runs) 


Algorithm: 
Function: 


1 


2 


algorithms: 


Haploid: 


Diplowd:: 


Source of Variation 


Function 
Algorithm 


Function x Algorithm 


Within Cell 


Haploid Diplowrd 
ia 0)0 74.40 
43.50 36700 
ST 523 88.16 
85.40 OZ 
Silo 84.29 
69-55 TS 1a 


parent selection = deterministic 
population size = 20 


CLOSSOVE LE shates = 10.26 

Mutation rate = 0.001 

ploidy = l 

parent selection = deterministic 
population size = 20 

Crossover rate = 055 

Mutation rate = 0.000025 

ploidy = 2 

dominance scheme = deterministic map 
dominance change operator = off 
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Figure 95.64. 
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Experiment 8: Diploidy for Limited Resources 


Overall Best Performance on Functions 1, 2, and 4 
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First, it is possible that diploidy is beneficial 
during somatic development (growth of the organism after 
fertilization), providing protection against deleterious 
mutations. In this role, diploidy would aid in preserving 
EHesVlabllitcy Of eindivicual organisms, but. would not abtect 
the adaptive evolution of the population. 

Second, the purpose of recombination may be to provide 


mOre rapid evolution of jpopulations... The conditions found 


by Maynard Smith under which recombination is beneficial are 


listed below. 

Heterozygote Superiority. Recombination of diploid 
organisms is desirable when the heterozygote at a locus is 
Superior to either homozygote. This situation does not 
develop in genetic algorithms for function optimization 
which use binary representations. 

DependenitahOc in melne si LiUness sore the combination wor 
alleles A and B at two loci may be superior to that of A or 
Be alone, chat is, -E (AB) sot (A) s6(B) x othen if 
DiAB, BMS pia, HE) p(B), shecombination  jwill pincrease sthe 
number of AB individuals in the population, even if p(AB,t) 


is initially zero. Thus recombination benefits the search 


for good schemata when the alleles defining the schemata are 


already in the population. Recombination and variance in 

the population together provide better adaptive search. 
Missing Optimal Alleles. When the optimal alleles at 

several loci are missing from a population, they must be 


introduced by mutation. This condition could be caused by 
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shifts in the environment which render a previously inferior 
allele optimal. If the population does not recombine, 
mutations must all occur over several generations in the 
same family line in order to produce at least one individual 
Containing all the mutations, 

Recombination allows mutations in two or more 
individuals of the same generation to come together in one 
individual within only =a’ few time steps. "SO 1 nP Deli, 
that is, more than one mutation occurs in one generation, 
recombination will reduce the time needed to produce 
individuals with all of the optimal alleles. Recombination 
in this situation increases the speed with which mutation 
can compensate for the absence of desirable alleles. 

From these cases, it becomes clear that the rate of 
response of a population to environmental factors depends 
noe om Hhaploidy “and diploidy per se, but on 
1) the variance of the population, especially whether or not 

alleles are totally absent from the population, and 
2) whether or not there is recombination between individuals 
in the population. 

Diploidy in natural systems may be useful in somatic 
development as a protection against harmful mutations. This 
use does not apply genetic algorithms. The use of a diploid 
reproductive cycle may improve population fitness and 
response to the environment under conditions of heterozygote 
superiority, dependent loci, and missing optimal alleles. 


The keys to this improved response are the maintenance of 
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population variance and the use of recombination. 

It has been shown that in a genetic algorithm, a 
population, size of *2007and a mutation rate or 0.001 are 
sufficient to prevent fixation. Crossover provides a 
powerful tool for recombination. It is apparent after the 
experiments of this chapter that diploidy cannot provide 
more population variance or recombination in a genetic 
algorithm than has already been obtained by haploid 
algorithms employing the crossover and mutation operators. 
However, it must be kept in mind that this result applies 
only to genetic algorithms with binary representations used 
on function optimization problems. 

Diploidy and dominance are not well understood. The 
question "why diploidy?" has not yet been answered for 
natural systems. But if the utility of diploidy does derive 
primarily from somatic differentiation, heterozygote 
Superiority, recombination, and maintenance of population 
variance, then the benefits of diploidy do not transfer to 


genetic algorithms £or function optimization, 


5.) CONCLUSIONS 

Under = diploidy, 1t 1s, possible to reduce crossover and 
mutation rates without reducing levels of recombination or 
population variance. However, with respect to overall best 
performance on difficult function optimization problems, 


diploidy does not improve genetic algorithms. 
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Diploidy may be useful for dynamic environments, in 
which case a dominance scheme which develops dynamically to 
reflect allele fitness is good. Deterministic, variable, 
global dominance maps are an example of such a scheme. 
Dominance change operators can improve overall best 
performance on diploid populations, especially on difficult 


functions. 
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CHAPTER 6 


COMPARISON OF GLOBAL FUNCTION OPTIMIZATION METHODS 


Many practical) problems “in global’ function optimization 
are characterized by a priori ignorance. The number of 
dimensions and bounds on each dimension are usually known, 
but there is no information about the behavior of the 
FUNCELON. me SouCI eb Vack sox. Shuncelonscsrequire ran 
optimization algorithm which makes few assumptions about the 
nature of the function and works well over a wide range of 
bunCceELOnS, 

Genetic algorithms optimize bounded functions over any 
number of dimensions. They do not require derivatives and 
are not limited to functions which are unimodal in Euclidean 
Space. Logically, the best area in which to apply genetic 
algorrehms sin optimization of ‘huclidean’ tunctrons is the 
avea "Of ‘global black box" optimization... “Ehis ichapter 


investigates that application. 


6.1 NUMERICAL METHODS 

Numerical methods for global optimization can be 
categorized according to what assumptions they make about 
the function. Methods which are tailored to a class of 
functions will be superior to general methods on those 
functions, but their performance elsewhere may not be 
predictable.  heretore efor black box, LUNCEIONS: Lewis 
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desirable to minimize the number of assumptions made. The 
following is a catalog of the best numerical methods 
currently available which do not require continuity or 
dtirerentiability of the function. 

No Assumptions. The only methods which make no 
assumptions whatsoever about the function are random search 
methods. Pure random search consists of the evaluation of a 
given number of points chosen at random from the solution 
Space. Modifications to pure random search such as 
Anderssen‘s» (1972) are also unrestricted in’ application, but 
have not yet been well analyzed. 

Grouped Optimal Points. Several methods assume only 
that optimal points are grouped in neighborhoods in 
Euclidean space. If this is the case, searching should be 
intensified near previously located good points. Creeping 
random search as proposed by Brooks (1958) centers a set of 
normally distributed trials) around “aniunitraly point, then 
moves the search to center around the best point found 
before repeating the process. Clustering algorithms have 
also been employed to locate optimal neighborhoods after an 
initial set of random trials has been completed (Price, 
OTe 

Smoothness. If the function is assumed to be 
reasonably smooth in Euclidean space, a grid can be imposed 
on the solution space such that each grid sector will 
probably contain at most one relative optimum. These 


sectors can then be searched by evaluating sector centroids 
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(Brooks, 958) foraby using some method for finding local 
Sptinae (esc mes hl mLVl6o Py, elie edit filculity awi Chesucheanrd 
methods lies in the determination of an initial sector size 
which corresponds to the smoothness of the function. 
Farlure sto vadopit sthe -appropriate initialtgridlis “usually 
disastrous. 

Quadratic Smoothness. If a function behaves much like 
a quadraticrepolynomial vin«the neighborhoods of local optima, 
it can be searched by local direct search or descent 
methods smvarvise (19 /S)yredlacoby » Kowaliky and kPizzo (1972), 
Kowalik and Osborne (1968), and Fletcher (1965; 1969) 
provide reviews of these methods. A global optimum can be 
located by running these local methods in global mode, 

i.e. applying them repeatedly from different starting 
pomnts. 

On sone=dimensional “functions, “search “by “quadratic 
approximation is often used. For higher numbers of 
dimensions, the best direct search methods are Rosenbrock's 
method as modified by Davies, Swann, and Campey (Box, 
Davies, and Swann, 1969; Swann, 1964), the Simplex method of 
Nelder and Mead (1965), and Powell's method (Powell, 1964; 
Brent, 1972). Gradient methods, which uSe derivatives to 
find the direction of greatest change in the function, can 
often be altered so that the derivatives are approximated. 
Davidon's variable metric method as modified by Stewart 
(Fletcher and Powell, 1963; Stewart, 1967) is a good 


example. 
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The Stewart and Powell methods are probably the most 
powerful local methods in terms of speed of convergence in 
the later stages of search. However, the method of Davies, 
Swann, and Campey exhibits good early performance. It also 
may be preferable for functions with large numbers of 
dimensions and may perform better than other quadratic 
methods on functions which do not satisfy assumptions of 
quadratic smoothness (Fletcher, 1965). 

Hamming Space Linear Independence. Genetic algorithms 
do not require smoothness of the function in Euclidean 
space. Instead, their performance is related to the 
linearity of the function when transformed into Hamming 
space. 

In summary, the only methods which make no assumptions 
about the function are random search methods. Other methods 
may, bey valad ony #bilack box" = £uncttons, however, if their 
performance is sufficiently robust. The most likely 
candidates for global optimization in environments of 
a prion ignorance sare pure random search) (PRS)) creeping 
random search (CRS), the Davies, Swann, and Campey method 
used in global mode (DSC), and the genetic algorithm (GA). 


These will be compared in Section 6.3. 


6.2 TEST FUNCTION SET) Il: ALGORITHM (COMPARISON 

In order to ccompaney the tour global optimization 
methods selected in Section 6.1, it is necessary to have a 
set of test functions which will challenge the robustness of 


each method. In addition, all the functions must be bounded 


S20m wes + 
nt sonppzeunes 208 
.BSiyau 20. Setiem ant 9 
sin 4% pines: 

do sieht epyel f 
vissvhsup aefto neta 

70 anotiondeas yietias sae tai ee FS 

» (20PL y eatio get) “erondtooms. a 
omishyegis ott4e0 .ostauboyptetet, Beams s seg? ontmnes” 
Anebicou’d ab soltoan} 4a Ie oeentdoome erlupey dowek 7 

ote pt Geateter 8 eonpasdieg, sfedd 4 bestenl» seosge © 

onlanen oink Bomotsoess seieietesee? ef? fo yiizesabll 
anctiquiete if 9 iam, #oiite sheds am Qiao od2 \yinnmue.at ih Ti 
qbodesam sedid cahiodden fSsese. Hobie: 212 Holionvas sad duods 7 
sisi? 2: ,yaeonndt ,andivonu2 “sed: gasid™ no biinw sd (om 7 
gheail Sean’ wi , dation Geapetei tine at soasmioized * 

Io ainetnortver ad setyaximiveo Iodele 101 nosablbass. | i 
eriqess5 LESS) dQ2898, PVRS e%eg 816 29008 70NpPL iwlag 5 
honten veges? Ore .nAewR yeelwed ona . (2835). dosees) mobr ~ 
» 0D). ost s0p1e aissnge, add bie .(7Ra) shar Isdolp alebar 
,Ee% moites? «i bevagaes ed a 


KOs Sea MOD WarsBOs1A 5il tae ovrant ah 

notcasivigad fsdolp qu silt s20qQeoD —s 2030 | = 

s Svea of la Ai ei vi 1.8 Aotsae8 
3g aeansoude: eas weietinte. thie doidw @ 
nsbnuos od 2200 StoeinAMY MMs Lhe, wor dit : 


— 


= 7 


147 


and have two or more dimensions, so that all four methods 
are applicable. The function characteristics which should 
be varied within the test set are smoothness in Euclidean 
Space, Euclidean modality, and Hamming space linear 
dependencies. Also, DSC may prove sensitive to the number 
of dimensions in the function, so this should be varied as 
well. 

The following ten functions comprise Test Function 
Set II. Three functions were taken from the literature. 
However, few experiments have been published which use 
highiyamultaimodal Lunctions, So ;f£unctionss3 through 8 and 10 
were constructed to provide the desired variations in 
modality. The bounds for most functions were set so that 
the genetic algorithm could search spaces with a resolution 
For each x-value of fom and a space size of some power of 


two. The main features of each function are given in 


fT abdkern6 <A. 


622 sleruneceloni. 
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GANA y=e1100 =! LOU te tk) ) eel ex ie 
Bounds: -3.4359738368 < x, < 3.4359738367, 
ae St) Sie A ae Se on 


Maximum: El (1y1)= 100.50 
Function 1 is Rosenbrock's valley (Rosenbrock, 1960), 
negated to. form a maximization problem. It is a quartic 
function which iS unimodal in Euclidean space, but not 
strongly unimodal. It is representative of the unimodal 


functions which are used to compare direct search and 
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Rosenbrock's 
Valley 
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2-D Fourier 
Sum 


Damped Sine 


10-D Fourier 
Sum A 


10-D Fourier 
Sum B 


Product of 
Fourier Pairs 


Full Fourier 
Product 


Stepreunct1on 


Hamming 
Function 


10 


10 


10 


fest Function Set. Lb. 


Number of Euclidean Hamming 
Dimensions Modality Linearity 


1 Nonlinear 
22 Nonlinear 
iba Nonlinear 
64 Nonlinear 


59049 Nonlinear 


59049 Linearly 


Independent 

2 Nonlinear 

? Nonlinear 

il Linearly 
Independent 

Se 110s Linearly 


Independent 
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descent methods in the literature. 


6.2.2 run Cctione. 


Let A = (Ay reee rnc) = (=32, -16, Or LG; B25 


ba (Dj ,+++1b5<) 


677676, 267, Lo, 


25 
x Z wear __ 6 = G& Sy Sil 
EZ xs ea) (Co Sema (2002) ees SO) ae ti 
i=l 
Bounds: -54.975581388 < x, < 54.975581387, 


OTs and ems dno7 oc cles =a pe oe 


Maximum: f2(-32,-32) ~ 100.0 
Function 2 is Shekel's foxhole function as used by 
De dong (De Jong, 1975-7Shekel, 1971). tt also has been 
negated to form a maximization problem. The surface is 
fairly flat with a minimum value of about -400. There are 
25 steep peaks at the points (a,,b;), L=) eet 2 each with 


a relative (maximum value of “about LO0li— 91. 
6.2.3 Function 3 
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WHET COC] LOT a= liters 7 5 


14 =O oO LOym2oO2o Oya Oe0 096,21 P2002 7 20004000 2F, 


aj, = 21218, 207646, 2060602, 19531250, 199720098, 

by5 = OSB 56105 1 -meo0 582 345-10. eee oAnasodae tone 
Sc 17650 64u nie ym =6 18492 72s Okey 

bo, = 2289 300M o> =n GIO eos 2, Cason 2. 
Paci | 3, Asaisee a 


Bounds: 0 < x, < .0001048575, for i=1,2. 


,0001048576 = 2797-39719, 


19 10719 219219719) = J00.0 


10 

Maximum: £3 (2 
Function 3 is the sum over two dimensions of Fourier 

sums iwhichmaressimivlar <ouPunetion GsaineTest) Function Set 
(Section: 824. 0)298 Tt icontains) roughly 108 relative maxima. 
The periods of all the sine waves are relatively prime to 


each other and to 2. 


6.2.4 Function: 4 


7 30 - 
Let Vas x, /(2 10 


2 
Oy ha ee fa x -.25 
EA ne oe 50) (1 y;) *sin(l6w(y, + .05874169672) ‘e 
i=l 
Bounds) 0h Xs Some LOWS VANS 23 ebm el=1732. 
MOOI ong ety, Reb) Sve the 


Maxamum:: £4°(0),0). ~ 1007, 0 
Function 4 is the sum over two dimensions of the 
dampened: sinusoid) of Test; function Set, lf (Section, 3.4%7 )\. 
The function has eight ridges crossing the space in each of 
the directions of the Xi It is smooth in Euclidean space, 


but has very unevenly spaced relative maxima. 
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6.225 Function 5 


Let X = (Xj reser Xi) > 
to 2 
£5(X) = 5) > sin (ra, (x; + bd), 
Ta) (eal 
where a, = 19531250, a, = 78125000, b, = Semi 
ae ane 
ba = Ord 10 e 
Bounds: 0 < x, < .0000001023, for i=l,...,10. 
.0000001024 = 219.49719, 


Maximum: £5(x) =8100.0) for xe Beil Qei0n oF 


PED ipictetehy LO. 
Function 5 as the sum of 10) fourier sums. “Bach Fourier 
sum has two component sine waves, one with one peak and one 
with three. This gives a smooth, high-dimensional surface 


with 3 = 59049 relative maxima evenly distributed over the 


6.92.6) SuUnCELOn 6 


Let X = (Xj ree+rXyQ)- 


Oe 22 
waxes) 3 Soe Vetntnee = ly). 
i=l k=l 


eal 
where a, = 19531250, a, = 78125000, b, = Epon 2... Dy = We 
Bounds: 0 < x, < 0000001023, for i=1,...,10. 
Maximum: £6(X) ~ 100.0 for x, = ioe One 


Vl creer pes 
Function 6 is’ Similar to Function 5, but» the sine waves 
for each dimension are shifted slightly so that Hamming 


Space nonlinearity is reduced. 
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Let X = (Xj reeerXy 0) 


2 ae eS 
Bea 5) ) sin (Ta, (X>i44 + Dy))s 
i=Onaelekat 


where ave by arenas a nrlunce Loni) petOreK=1)2. 
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100.0 for x, = eo hom 


Bounds: 0 iS Xs 
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Maximum: £7 (X) 
hea} ills 
Function 7 is the sum of five products involving the 
ten Fourier sums used in Function 5.9 Thus, in contrast to 


Functions 3), 4) 5, and. 6, Function 7 will have some 


non linearrey,. 
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Let X = (Xp reeesXj 0) > 


10° 2 
£8(X) = 09765625 || ) sin(ma, (x; + bid), 
i=l j=l 


where a@., bb. abe ass in Function. 5; for jg=1)2. 


aUVOOO0OTO235  £OL SIS; 10. 
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Function 8 is the product of the ten Fourier sums used. 
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Let X = (Xj reeerXyQ)- 
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Function 9 is a discontinuous, high-dimensional step 
function. It has one relative maximum value, which occurs 
over a hypercube in one corner of the solution space. The 
corner chosen has five high x-values and five low ones, to 


discourage a method which searches all dimensions in the 


Same direction initially. 


620 Funct ron 20 


Let X = (Xp reer Xo) 


5 
C1LOWX)) = 2) Hamming Distance of x; J Om D2. 
i=1 
Bounds: 0 < x, < 1023, for i=1,...,5. 
Maximum: £10(X) = 100 when 512 < Xs Kms 
1 =) 
Function 10 is a five dimensional version of the 
Hanmingsfunctione (Punctionsl)SotetTest Function Sete 1 
(Section 3.4.1).  Iteis linearly independent in’ Hamming 
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space and has 2 =etd..612 110 peaks in Euclidean space. 


6.3 EXPERIMENT 9: GLOBAL OPTIMIZATION 
inMnOnder to compare ithe four methods of Sections6..1y, 
is necessary to set up Similar modes of operation for each 


method. The commonly used global mode for DSC involves 


running the method to convergence on each of several random 


starting points. However, the genetic algorithm is always 
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allowed an initial population of 200 random starting points. 
The genetic algorithm should logically be compared to a DSC 
method which searches in parallel starting from 200 random 
points. | Thus for CRS, DSC, and GA), two running modes will 
be used: 

Convergence mode ("c" mode). Run until convergence 
Starting from one random point (or random population), then 
restart. Continue up to a maximum of 10000 function 
evaluations. 

Fixed starts, model ("f£") mode)i.). Use a) fixed’ number of 
random starting points. Spend the same amount of time 
searching from each point, so that the total number of 
Eunctaon, evaluations, is) 10000. 

The parameters used for each of the methods are 
described in Appendix D. Seven methods, PRS, CRS Uy CRS er 
were run 10 times each on each 


DSCay, (DSC GA_, and GA 
c C 


fe ae 
function In Testerunctronyset ily 

The overall best performances of the methods are given 
in Table 6.2. Creeping random search surpassed pure random 
search on only one function, and wasS worse on several. 
Since CRS is more expensive than PRS to use, requiring the 
generation of normally distributed points, it need not be 
considered further as a global optimization technique. 

DSC exhibited marked differences in performance 
depending on the mode of operation used (Table 6.3). DSC 


employed only 50 to 100 random starting points when in 


convergence mode. On the dampened sinusoid (Function 4), 
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Table 6.2. Experiment 9: Global Optimization. 
Overall Best Performance after 10000 Function Evaluations 
(results averaged over 10 runs) 


Function 


OWON DU &WN FE 


Fo 


PRS CRS (Cc) 


99.99 98.44 
99-9598 505.04 
Ore) Ope Ute bo 
82.06 44.84 
Diliateme) © Alene! 
59204 23.46 


ell 20.00 


3.54 OL 


80) 0a 5 2/1 0 
76.80 63.00 


Method? 

CRS (i) DSE (Cc) 
99.99 100.00 
9973509 oo 
78.769 290—70 
S274 Upeelie 22 
Slo Um Ore O. 
Sle A aee O32 2 
ZORU0N 19.82 
a99e SuUe79 
V2 Oe 9390 
WsiA PA) ish ay) 


pure random search 
creeping random search 


numerical method of Davies, 


genetic algorithm 
convergence mode 
fixed starts mode 


Source of Variation 


Method 
Function 


Method x Function 


Within Cell 


19200.00 
66400.00 
1344.00 
185280 


DSC (CE) 


100.00 
100.00 
90.82 
88.79 
Sl.23 
sia) reais) 
19 97 
10.27 
96.40 
85:00 


hetabio, olgnlpicant 


EIS) 5 280) 
397330 
1.23 


GA (c) 


9:92.98 
99.45 
92.34 
D302 
Ue Ge) 
OGre2 
20.00 
Bie Oe 
93.00 
93.40 


Swann, and 


GA (£) 


99 96 
99730 
eyes JES} 
92.41 
roel INS) 
92.40 
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4.50 
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Table 6.3. Experiment 9: Global Optimization. 


Overall Best Performance of DSC 
artery OU0UN Function» Evaluations 
(results averaged over 10 runs) 


Fsrau1o Signieicane 


FunCcei1On Mode 
Convergence Fixed Starts 
1 100.00 100,00 
2 SES es Sia 100.00 
3 90.70 O05 82 
4 AVe22 88.79 
5 S6.20 Slve23 
6 OAM S51 
7 19:.:82 19597 
8 31.79 POR 27. 
9 98.90 96.40 
10 88.40 85.00 
Source of Variation M.S. Dae. 
Method 380.80 7 DRI EE TA) 
Function 18620.00 9 1350.00 
Method x Function 346.80 9 Phy VAD 
WieEnin Cell 3.76 1380 
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using 200 random) starting points in fixed’ starts mode 
increased the probability of initiating hill-climbing on one 
of the narrower, higher ridges. On the more regular Fourier 
sums (Functions 5 and 6), DSC performed well when allowed to 
converge on a peak after each start. 

The genetic algorithm was not affected by operating 
mode (Table 6.4). The algorithm maintained population 
variance at a level which prevented convergence within 10000 
function evaluations, and the algorithm was rarely restarted 
on a new, random population. Thus the two modes behaved the 
Same. Because fixed starts mode is the natural way of 


running the genetic algorithm, only GA, will be considered 


E 
in further discussions. 

The genetic algorithm always equalled or surpassed pure 
random search. DSC, however, offered stiff competition. 
Plots of overall best performance of PRS, DSC ., and GA, LO 
all ten functrons ere oi venwinefigunes :6.1a,— 0, Cc, and, da. 
Each point iS an average of 10 runs. Note that the vertical 
Scale Vanies from function tComiuncrion. 

On all functions except the linearly dependent Fourier 
products (Functions 7 and 8), the genetic algorithm 
exhibited a steadily improving performance curve. On 
functions linearly independent in Hamming space 
(Funct1ons. 6,9, and 10)),sthesplots indicate that userid 
search was Still in progress at (the: arbitrary cucolt point 
of LOOU0Stunction evaluations.) On Functions 7 ands, 


however, locus dependencies prevented effective search. 
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Table 6.4. Experiment 9: Global Optimization. 
Overall Best Performance of GA 
after 10000 Function Evaluations 
(results averaged over 10 runs) 


Function Mode 
Convergence Fixed Starts 

1 99.98 99.98 

2 99.45 99.80 

S 92134 91.13 

4 93°.62 92.41 

5 19.63 O83 

6 S8.21 92.40 

7 20R0.0 20'..0:0 

8 PA SO 4.50 

9 937,00 O20 

10 93.40 93.40 
Source Of Variation Mos. Deb ho kealor oom evcant 
Method 12°66 1 dia SiS) 
Function PESO) 9 22650'700 * 
Method x Function Isis 9 1.46 
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Experiment 9: Global Optimization. 
Overall Best Performance on Functions l, 
Pure Random Search 

DSC Numerical Method 
Genetic Algorithm 
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Figure 6.1b. Experiment 9: Global Optimization. 
Overall Best Performance on Functions 4, 5, and 6 


(+) Pure Random Search 
(o) DSC Numerical Method 


(Convergence Mode) 


(*) Genetic Algorithm (Fixed Starts Mode) 
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Figure 6.lc. Experiment 9: Global Optimization. 
Overall» Best) Performanceson Functions ¥/ sand: s 
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Figure 6.ld. Experiment 9: Global Optimization. 
Overall Best Performance on Functions 9 and 10 
(+) Pure Random Search 
(o) DSC Numerical Method (Convergence Mode) 
(*) Genetic Algorithm (Fixed Starts Mode) 
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The performance curves of DSC differed, depending on 
the function. As expected, the maximum for Rosenbrock's 
valley (Function 1) was obtained well before 500 function 
evaluations. On the other Euclidean unimodal function 
(Function 9), a good value was also achieved very early. 
DSC does) not handle bounds -on the function efficiently. 
This, and not the discontinuities in the function, may 
account for its failure to locate the global maximum on 
Funct lon 9). 

On -funceLons with narrow peaks (Functions 2 and 4, 
possibly Function 8), DSC spent a great deal of time 
converging on low flat hills, but made sudden, rapid 
improvements whenever a random start point landed on one of 
the rarer sharp peaks. Elsewhere, the method quickly 
climbed) thosewills that eitsmctarting: points wlocated;, 
yielding performance curves which paralleled those of PRS, 
but were higher by values of 10 to 30. 

The different rates of improvement of the two methods 
make comparisons of final values diftiicult. “Tables 6.5 and 
6.6, and 6.7 show overall best performance values after 
2000, 5000 and 10000 function evaluations, respectively. 
The functions are grouped according to whether DSC, neither, 
or GA is superior. Except for Functions 2 and 4 (Shekel's 
foxholes and the dampened sinusoid), DSC has better early 
performance (0) to 2000 functionsevaluations).) The genetic 
algorithm is a good long-term optimizer, failing badly only 
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Table 6.5. Experiment 9: Global Optimization. 
Overall Best Performance of DSC versus GA 
after 2000 Function Evaluations 
(results averaged over 10 runs) 


Function Method 


DSC (Convergence) GA(Fixed Starts) 


2 95.43 90.05 

3 85.08 Piston a 

5 84.69 62). 66 

6 SMES AAs) 66.99 

8 PALA Sry) 06 

g 96.30 83.60 

£0 84.40 tie 80 

aul 100.00 9935 

y bck y/ ti 2.07..00 

4 60.14 84.89 
Source of Variation M.S. DoE. SE =bavho oLgnrt icant 
Method 20i2.27..0'0 i 60.20 * 
Function 137307300 9 55:8°..00 * 
Method=x function 1092.00 9 32500 * 
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Table 6.6. Experiment 9: Global Optimization. 
Overall Best Performance of DSC versus GA 
after 5000 Function Evaluations 
(results averaged over 10 runs) 


Function Method 


DSC (Convergence) GA(Fixed Starts) 


5 86.10 75.96 
6 92225 Saree 
8 2764 1.90 
9 98.00 89.40 
HI 100.00 99.96 
2 98.51! 99.79 
3 89.09 Sits 
7 19.80 20200 
10 86.80 87.00 
4 T5213 89.98 
Source, Of Variation M.S. Defay bee BO SILOM Grcant 
Method 688.90 I 37600 
Function 19560.00 9 1050.00 
Method x Function 557390 9 30.00 * 
Within Cell £8.60 180 
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Table) 6./.8 Experiment’ 9s). Global® Optimization. 
Overall Best Performance of DSC versus GA 
after 10000 Function Evaluations 
(results averaged over 10 runs) 


Eunction Method 


DSC (Convergence) GA(Fixed Starts) 


5 86.10 Spl x kes 

8 aL79 Are 50 

9 98.90 Di2to 0 

iL 100.00 99.98 

2 99.31 99.80 

3 NOs 70) Onlve ies 

6 OBrye22 924.0 

1 ore Si2 20.00 

4 GSE? 92541 

10 88.40 93.40 
Source: Of) Varzation Mac Dee eatatelor SIO Cant 
Method 165%.6.0 if L050 * 
Function 19660.00 9 1240.00 * 
Method x Function 57 4. DU 9 36.0 0 * 
Within Cell Sigs, ale 0) 


6.4 CONCLUSIONS 


Both the DSC numerical method and the genetic algorithm 


are preferable to random search for "black box" global 
optimization. On functions with very low Euclidean 
modality, DSC is preferable, since it converges rapidly to 
optimal values, whether or not the function is smooth. On 
multimodal functions, both DSC and the genetic algorithm do 
Fairly “well. 

If the peaks of a multimodal function are narrow 
(Functions 4 and 10), they are unlikely to be found by DSC. 
In’ this "case, “the*’genetic algorithm is superior, since "the 
translation it makes to Hamming space for searching may 
reveal relationships between peaks which are invisible to 
DSc. 

To summarize, the performance of a numerical method is 
dependent upon the shape of the function in Euclidean space, 
while a genetic algorithm is affected by the shape of the 
function in Hamming space. DSC is at its best when 
optimizing unimodal surfaces in Euclidean space, while a 


genetic algorithm does best on functions which are linearly 


independent in Hamming space. If, however, a function has a 


high degree of nonlinearity in Euclidean space, which is 
accompanied by nonlinearity in Hamming space, then both 


methods are inadequate. 
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CONCLUSIONS 


Genetic algorithms are adaptive search methods which 
employ some of the mechanisms for adaptation in natural 
evolution. Given the apparent success of natural evolution 
and the intuitive appeal of adaptive strategies in problem 
solving, it 1S natural to anticipate many successful 
applications. Tor suchsan algorrthm.ssln Chapter 1,8 two major 
questions were posed about these algorithms: what is the 
domain of problems which genetic algorithms are useful in 
solving, and which genetic algorithms are desirable for such 
use. This dissertation has attempted to answer these 
questions for genetic algorithms applied within the field of 
function optimization. 

In mathematical function optimization, the performance 
measure which is of primary interest is the overall best 
solution achieved by a method. For overall best 
performance, it is desirable to maximize the search 
capability of the genetic algorithm by maintaining high 
levels of population variance, usually at the expense of 
online, or average, performance. Experiment 2 in Chapter 3 
indicated that this can be accomplished by using large 
populations (200 individuals) and choosing a mutation rate 


inversely proportional to population size. 
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Population variance also benefits from controlling the 
error accumulated during parent selection. When parents are 
sampled for each generation, the sampling method may involve 
bias or variance. If so, there may be a steady drift over 
generations away from those populations which would reflect 
most ideally the concept of selection according to fitness. 
In Chapter 4 several sampling methods were compared for 
bias, variance, and effect on genetic algorithm performance. 
Experiment 4 revealed that population variance is affected 
by sample variance, which can be minimized by the use of a 
deterministic, rounding technique for sampling. Good 
Overall best performance can be achieved by the use of 
either stochastic sampling without replacement or the 
deterministic method. 

It had long been hypothesized that diploidy and 
dominance could assist in the maintenance of population 
variance in a genetic algorithm. After experiments 5 
throughs jin Chapterss, however, 1t must be concluded that 
diploidy does not raise levels of population variance above 
those achieved by large haploid populations using minimum 
variance parent sampling. No major benefits derived from 
the use of diploidy and any of six dominance schemes in 
improving algorithm performance. 

This result is better understood in light of developing 
theories in population genetics on the causes and purposes 
of dominance. Diploidy may be beneficial in natural 


organisms as a defense against deleterious mutations, and 
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may improve organisms in situations of heterozygote 
Superiority or missing optimal alleles. None of these 
benefits would apply to genetic algorithms for function 
optimization which use binary chromosomes. 

The question of where genetic algorithms are applicable 
in static function optimization can now be answered. In 
Chapter 6 a set of ten test functions were designed which 
incorporated a number of levels of dimensionality, modality, 
and continuity. On these, a genetic algorithm was compared 
to random search and a "direct search" numerical method. 
Although genetic algorithms are superior to random search 
for global function optimization, they cannot be considered 
any more robust than a good local numerical method run in 
global mode. Such numerical methods are designed for 
functions which are unimodal in Euclidean space. They can 
be designed to handle discontinuity, high dimensionality, 
and the absence of explicit gradient information. 
Multimodality, however, invariably impairs their 
performance. 

Likewise, the performance of genetic algorithms is not 
abtected "by “continuity, <dimensionality,. «or 
adittrierentiability. sin -addi tion, genetic algorithms “are 
capable of optimizing some highly multimodal functions. 

What does affect these algorithms are the nonlinearities of 
a function when transformed into an equivalent function in 
Hamming space. Genetic algorithms perform well on functions 


which are linearly independent in Hamming space; 
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nonlinearities among many loci impair their performance. 

Thus genetic algorithms with binary chromosomes are 
useful on any function where it is suspected that 
performance is determined by the individual loci of a binary 
NUMDEeY OrerSet.. 9 That sic seine Ordermtor -atgenebicealgorithm to 
be effective, there must be not only a convenient binary 
representation for solutions, but the elements of this 
representation must have a straightforward effect on the 
function value. This is not usually the case when binary 
chromosomes are translated into real x-values for evaluation 
invanadlfficult functlonminsbuclidean space. Functions which 
are multimodal but smooth in Euclidean space, that is, two 
neighboring points in Euclidean space have close values, are 
probably best optimized by descent methods. 

This research has concentrated on genetic algorithms 
with binary chromosomes applied to unconstrained function 
optimization. Many other applications for genetic 
algorithms remain to be studied, some of which will probably 
require alternative representations of solutions. 

First, there are constrained problems in function 
optimization. One method for handling constraints is to 
give a solution which does not satisfy the constraints a 
Verys poor Walueeand Nncludesitein  therpopula tion.) @Also, ait 
may be possible to design a measure of how far outside the 
constraints a solution falls. Then a separate population of 
invalid solutions could be maintained in order to use the 
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Even more interesting are set problems. As mentioned 
IMBoectilony 27.3.0) te the: solutionsspace, of tas problem 
consists of subsets of one superset, the obvious 
representations binary, loci which indicate inclusion: of 
set elements in the solution subset. It has been shown that 
genetic algorithms are best used when the evaluation of a 
solution depends upon individual values in a natural binary 
representation. Set problems present a promising area of 
application for genetic algorithms. However, the simplest 
representation is not feasible if the superset is very 
large. Even if the superset is small, this representation 
may not lead to good genetic algorithms. 

Consider the knapsack problem, one of a class of 
computationally complex set problems (Aho, Hopcroft, and 
Ullman, 1974). The problem is the following: given a 
positive integer M and a set Q of pairs of positive 


integers, Q = {(Py1Cy) rece (D Cetin, 
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Here the representation mentioned above for problems of size 
n = 100 will require chromosomes of length 100. It might be 
expected that a genetic algorithm using such a 
representation would form a valuable heuristic, supplying 


approximate solutions in linear time. However, many 
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specific heuristics have been developed for this problem and 
Others like at (Korte, 1979)'. ' Comparisons reveal that the 
heuristic method of Ibarra and Kim (1974; 1975) can achieve 
far better solutions than the genetic algorithm in less than 
one tenth the time. 

Tt seemselicelivetthac fom mostedliivoult, seumproblems, 
the successful application of genetic algorithms will depend 
upon innovative representations and corresponding changes to 
the operators within the algorithms. In addition, research 
on such problems would probably benefit from the comparison 
of genetic algorithm performance to the performance of a 
numerical method such as that of Davies, Swann, and Campey 
(Swann, 1964) adapted for use in Hamming space. 

Genetic algorithms for function optimization emphasize 
overall best performance in the optimization of environments 
character zedaby a spElLOLY ignorance and Stacie. complexity. 
Since the mechanisms of genetic algorithms are taken from 
natural systems, which operate within environments of 
dynamic ignorance and uncertainty, genetic algorithms may 
prove to be most valuable on time-varying or probabilistic 
environments. If so, measures such as online performance 
Will 'be Of sprimary importance. main panticulem, genetic 
algorithms for tracking time-varying environments have not 
yet been thoroughly analyzed. Haploid and diploid genetic 
algorithms have been compared on time-varying environments. 
The haploid algorithm exhibited good online performance, but 


has not yet been tested against other, non-genetic methods. 
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APPENDIX A 


FUNCTION OPTIMIZATION GLOSSARY 


Bounds: The bounds of a function are maximum or minimum 
values for each x-value of the function. These provide 
OUEeY Hlimieseon@acceptablemsolutionss / Constrainang 
equations of the forms Xs Gem; tee 4m, Xs < m and Xi > m 
define bounds. 

Concavi ty: Astunctioneis "Strictly upward concave 1£ the 
line segment joining any two points on the function falls on 
or below the function. 

Constraints: A function: 1s) constrained 1f— solutions 
must satisfy conditions other than simple bounds. 

Additional equations, each involving several dimensions, are 
a common example of constraints. 

Continuity 8A function i is continuous tat saaporne 

P= (Dy reeerP,) tioPLtousedebined tatlthatk polnteand; for 


lim 


mae. Sema UE) 


X = (Xj reeerX 

Convexity: A function is strictly upwards convex if the 
line segment joining any two points on the function lies on 
or above the function. 

Dependence: see Linear Independence. 

Descent Methods: see Numerical Function Optimization 
Methods. 


Differentiability: A function £ is differentiable at a 


point X = (X)1XoreeesX,) Pee sLOr wl =e, scereyy tly 
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lim , 
h->o (E(Xp rece Kite eee XL )HE (KX) pee Xe ree eX) )/h exists. 

Dimensionality: Dimensionality refers to the number of 
dimensions in the Euclidean space over which a function is 
defined. 

Direct Search Methods: see Numerical Function 
Optimization Methods. 

Euclidean space: The notation (Xp reser kX) used for a 
solutvon) pointeis a representation of that point in 
n-dimensional Euclidean space. Two points are close 
together in Euclidean space if the differences between their 
respective x-values on each dimension are small. 

Gradient Methods: see Numerical Function Optimization 
Methods. 

Hamming space: A discrete binary representation of a 
solution point as arepresentation of ‘thatepoint in) Hanming 
Space. Two points are adjacent in Hamming space if they 
dittertibysonlyaonesbur TO0Por )R MSpace: here wrerer snot to a 
mathematical vector space, but simply to an encoded set of 
points with a distance measure. 

Independence: see Linear Independence. 

Indirect Methods: Indirect optimization methods are 
methods which do not employ search. The optimum is computed 
without evaluating individual points of the function. 
Solving a linear equation and its linear constraint 
equations by elimination is an indirect method. Computing 
the critical points Of a function by Linding points where 


the function derivatives are zero is also indirect. 
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Linear Independence: A function is linearly independent 
Over a domain of variables if ut can be written as the sum 
of subfunctions, each of which has a domain of at most one 
varnable. f(x,y) = 3xo+ y? is linearly independent over x 
and ys. “A function «which cannot be reduced. tovsuch a sum has 
nonlinearities, or dependencies, between two or more 
Varlables.tf (xy 72) B= xet 2x*y + oe has two dependent 
variables, x and y. 

Linearity: see Linear Independence, 

Modality: Modality refers to the number of peaks (in 
maximization problems) or valleys (in minimization problems) 
Ofmagrunctiona, Let erate be the location of the global 
maximum of a function f. Assume f is to be maximized. Let 
x and y be otherepoints for which f 1s defined. 

Unimodal functions have one relative maximum value: 


there exists a path x to x fOr which) £) 1s nondecreasing. 


opt 
Strictly unimodal functions have no plateaus, but may 
have ridges: there exists a path x to Soe fOr whichever as 
increasing. 
Strongly unimodal functions have only straight ridges: 
f is increasing on the line segment between x and yaa 
Linearly unimodal functions have no ridges, but may 
have plateaus: f is unimodal on the line segment between any 
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Nonlinearity: see Linear Independence. 
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Numerical Function Optimization Methods: 

Gradient methods (also called Descent methods) use 
information on derivatives to locate the direction of 
greatest change of the function. New points are located by 
moving in the direction of greatest improvement from 
previous points. 

Direct Search methods do not employ gradient 
information. There are three main types: Tabulation 
methods, which "box in" the optimal point and then reduce 
the size of the box, Sequential methods, which repeatedly 
evaluate points at the vertices of geometrical figures, and 
Linear methods, which use a set of direction vectors and 
repeatedly perform linear searches along those directions. 

Tabulation Methods: see Numerical Function Optimization 


Methods. 
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APPENDIX B 


BIAS OF REMAINDER STOCHASTIC SAMPLING WITHOUT REPLACEMENT 


Proposition: Sampling without replacement from the rie 
1@lj,<+s,n, 1S almost always biased. 


PrOor: 

The stochastic portion of remainder stochastic sampling is 
altered as follows to yield sampling without replacement 
(see Figure 4.3). If i is sampled, rs CVEraction (1) ) sis 
replaced by 0 in the distribution and the range R for the 
random numbers is reduced by rie 


E[f,] = Elw; + 9] 


Ww; + Elg,]. 


E(g,] = 0(P(g,=0) + 1(P(g,=1)) 


Let Z. represent the r value of the element sampled at time 
Jeo then the probability of selecting element 1 1s equal £0 
the sum of the probabilities of selecting i at each time j, 
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the first element (the one which has the maximum r value), 
at least one of the terms must be strictly less than one. 


Hence for™i)= J, 


Thus 


In general, as long as the non-zero r; are not all equal (a 
finite subset of the possible cases), the elements with 
larger r values will be biased towards underselection and 


those with smaller values towards overselection. 
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APPENDIX C 


MEAN SQUARE ERROR OF THREE SAMPLING METHODS 


Assume the notation of Chapter 4. Thus for 1=1,...,n;, 
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The differences between the mean Square errors for the three 
methods are examined by breaking each comparison into cases. 
This sheds light on which conditions render the values equal 


and which provide strict inequality. 
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Proof: 
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Case III: R=m. 
This occurs only when Se = rs POR Isle, Nye when 
case the two methods are identical. Therefore 


MSE, = MSERoe 


Summary: MSE. Ze MSEDo: 


inepart lc am, MSE. = MSERs 
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Proposition: MSE, 2 MSE). 


> 
Proor,: 
Case LR = 0. 
In this case, the deterministic and remainder stochastic 
methods are identical. 
-- MSERS = MSE) 
Gase Ll: -Re= 
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APPENDIX D 


GLOBAL OPTIMIZATION METHODS 


(This appendix is not intended to supply full descriptions 
of the algorithms employed. Only those parameters which are 
specific to the implementation are mentioned.) 

PRS (pure random search). 10000 uniform random points 
in the space are evaluated. 

CRS . (creeping random search, convergence mode). CRS 
evaluates 30 points which are normally distributed around a 
randomly chosen initial point, with standard deviation 
0.2°(U - L), where U and L are the upper and lower bounds of 


x The center Of the distribution is” then moved to the best 


1° 
point found and the square root of the distance moved is 
adopted as the new standard deviation. The distribution 
continues to move every 30 evaluations until two moves of 
distance less than” 2.e—10) (0702> for Punction 9," 2" or 
Function 10) occur. Then the process repeats from a new 
random point. 


CRS. (creeping random search, fixed starts mode). If 


i 
ERS were run on 200 Starting points for a total of VE0000 
function evaluations, there would be only one movement of 
therdretribution from each point. So CRS ¢ employs 100 
starting points, allowing 100 function evaluations from each 


point. The other features of the method are described under 
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DSC, (Davies, Swann, and Campey method, convergence 
mode). DSC performs a linear optimization in each of the d 
directions in the space, using a random starting point and 
any Inittialestep size of 0.1°(U;-L;) inseach direction. U; 
and L; are the upper and lower bounds of each dimension i. 
Then a new set of d orthogonal directions is chosen to 
reflect the direction of greatest improvement in the 
previous iteration, and a new iteration of linear 
optimizations is performed. When one iteration yields a best 
point. lessethany 2,6-10' (0. 02.f0rm Function 9, 2 LOL 
Function 10) irom 1s initial point, another random starting 
point is selected and the process repeats. 

DSC, (Davies, Swann, and Campey method, fixed starts 
mode). DSC searches for 50 function evaluations on each of 
200 random starting points. The iterations of the method are 
explained under DSC... 

GA. (genetic algorithm, convergence mode). A haploid 
genetic algorithm with population size 200, crossover rate 
0.6, mutation rate 0.001, and deterministic parent selection 
is run repeatedly to convergence on random initial 
populations. Convergence for CRS and DSC 1s defined “as the 
lamiting Of esearch Co. a region of radius 2.e-10) (0.02) for 
Function 9, 2 fon Function 10) “around a point. Since the 
genetic algorithm is searching with a resolution of l.e-10 
(O,01.60r) Function 9, Il for Function 10)" between points, a 
definition of convergence Similar to that of the other 


methods would be the reduction of search to within a region 
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